jsteinhardt comments on A Fervent Defense of Frequentist Statistics - Less Wrong

43 Post author: jsteinhardt 18 February 2014 08:08PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (125)

You are viewing a single comment's thread. Show more comments above.

Comment author: jsteinhardt 15 February 2014 03:05:25AM 1 point [-]

Yes, but in this setting maximum a posteriori (MAP) doesn't make any sense from a Bayesian perspective. Maximum a posteriori is supposed to be a point estimate of the posterior, but in this case, the MAP solution will be sparse, whereas the posterior given a laplacian prior will place zero mass on sparse solutions. So the MAP estimate doesn't even qualitatively approximate the posterior.

Comment author: Leon 16 February 2014 09:39:05AM 2 points [-]

Ah, good point. It's like the prior, considered as a regularizer, is too "soft" to encode the constraint we want.

A Bayesian could respond that we rarely actually want sparse solutions -- in what situation is a physical parameter identically zero? -- but rather solutions which have many near-zeroes with high probability. The posterior would satisfy this I think. In this sense a Bayesian could justify the Laplace prior as approximating a so-called "slab-and-spike" prior (which I believe leads to combinatorial intractability similar to the fully L0 solution).

Also, without L0 the frequentist doesn't get fully sparse solutions either. The shrinkage is gradual; sometimes there are many tiny coefficients along the regularization path.

[FWIW I like the logical view of probability, but don't hold a strong Bayesian position. What seems most important to me is getting the semantics of both Bayesian (= conditional on the data) and frequentist (= unconditional, and dealing with the unknowns in some potentially nonprobabilistic way) statements right. Maybe there'd be less confusion -- and more use of Bayes in science -- if "inference" were reserved for the former and "estimation" for the latter.]

Comment author: jsteinhardt 16 February 2014 10:27:58PM 1 point [-]

Also, without L0 the frequentist doesn't get fully sparse solutions either. The shrinkage is gradual; sometimes there are many tiny coefficients along the regularization path.

See this comment. You actually do get sparse solutions in the scenario I proposed.

Comment author: Leon 17 February 2014 01:30:25AM 1 point [-]

Cool; I take that back. Sorry for not reading closely enough.