Leon comments on A Fervent Defense of Frequentist Statistics - Less Wrong

43 Post author: jsteinhardt 18 February 2014 08:08PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (125)

You are viewing a single comment's thread. Show more comments above.

Comment author: jsteinhardt 12 February 2014 07:46:59AM 3 points [-]

Yes, I mixed up x and y, good catch. It's not trivial for me to fix this while maintaining wordpress-compatibility, but I'll try to do so in the next few days.

This problem is called the "compressed sensing" problem and is most famously used to speed up MRI scans. However it has also had a multitude of other applications, see here: http://en.wikipedia.org/wiki/Compressed_sensing#Applications.

Comment author: Leon 15 February 2014 12:07:49AM 3 points [-]

Many L1 constraint-based algorithms (for example the LASSO) can be interpreted as producing maximum a posteriori Bayesian point estimates with Laplace (= double exponential) priors on the coefficients.

Comment author: jsteinhardt 15 February 2014 03:05:25AM 1 point [-]

Yes, but in this setting maximum a posteriori (MAP) doesn't make any sense from a Bayesian perspective. Maximum a posteriori is supposed to be a point estimate of the posterior, but in this case, the MAP solution will be sparse, whereas the posterior given a laplacian prior will place zero mass on sparse solutions. So the MAP estimate doesn't even qualitatively approximate the posterior.

Comment author: Leon 16 February 2014 09:39:05AM 2 points [-]

Ah, good point. It's like the prior, considered as a regularizer, is too "soft" to encode the constraint we want.

A Bayesian could respond that we rarely actually want sparse solutions -- in what situation is a physical parameter identically zero? -- but rather solutions which have many near-zeroes with high probability. The posterior would satisfy this I think. In this sense a Bayesian could justify the Laplace prior as approximating a so-called "slab-and-spike" prior (which I believe leads to combinatorial intractability similar to the fully L0 solution).

Also, without L0 the frequentist doesn't get fully sparse solutions either. The shrinkage is gradual; sometimes there are many tiny coefficients along the regularization path.

[FWIW I like the logical view of probability, but don't hold a strong Bayesian position. What seems most important to me is getting the semantics of both Bayesian (= conditional on the data) and frequentist (= unconditional, and dealing with the unknowns in some potentially nonprobabilistic way) statements right. Maybe there'd be less confusion -- and more use of Bayes in science -- if "inference" were reserved for the former and "estimation" for the latter.]

Comment author: jsteinhardt 16 February 2014 10:27:58PM 1 point [-]

Also, without L0 the frequentist doesn't get fully sparse solutions either. The shrinkage is gradual; sometimes there are many tiny coefficients along the regularization path.

See this comment. You actually do get sparse solutions in the scenario I proposed.

Comment author: Leon 17 February 2014 01:30:25AM 1 point [-]

Cool; I take that back. Sorry for not reading closely enough.