You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

jsteinhardt comments on What is the best paper explaining the superiority of Bayesianism over frequentism? - Less Wrong Discussion

-1 Post author: Meni_Rosenfeld 01 January 2013 08:58PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (32)

You are viewing a single comment's thread. Show more comments above.

Comment author: jsteinhardt 02 January 2013 07:42:46PM 1 point [-]

Well, there's a couple of issues here: first, logP(data|model) is a concave function for logistic regression, so unless logP(model) is also concave, the maximization may not reach the global optimum.

Secondly, the proper Bayesian thing to do would be to sample from the posterior, not maximize; for instance, in logistic regression the model is given by a vector of parameters denoted by theta. Suppose that we actually believed that the prior on theta was exp(-|theta|), where |theta| is the sum of the absolute values of the coordinates of theta. Then maximizing P(model|data) in this case will tend to give you solutions where most of the entries of theta are equal to 0, whereas the actual posterior places zero probability mass on such solutions.

Comment author: Luke_A_Somers 02 January 2013 10:13:24PM 0 points [-]

On the second point - fair enough, though even under Bayes it's sometimes reasonable to want a single answer on account of you only get to actually do one thing.

If you have that prior and you maximize P(model|data) on solutions with a zero probability mass on either P(data|model) or P(model), you're screwing up multiplication.

Comment author: jsteinhardt 02 January 2013 10:46:41PM 0 points [-]

Well, the point is that if you have a continuous-space, then the maximum-likelihood solution will have zero entries with positive probability, but the posterior probability of a zero entry is 0.

Comment author: Luke_A_Somers 03 January 2013 03:27:24PM 0 points [-]

How? If any of the probabilities that the posterior probability factors into are zero, the product is also zero. Or do you just mean that since data are unlimited precision in a continuous space, no answer can ever have a positive probability because it's infinitely unlikely?