jsteinhardt comments on Original Research on Less Wrong - Less Wrong

21 Post author: lukeprog 29 October 2012 10:50PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (47)

You are viewing a single comment's thread. Show more comments above.

Comment author: jsteinhardt 31 October 2012 04:29:00PM -1 points [-]

Suppose that our data are coin flips, and consider three hypotheses: H0 = always heads, H1 = fair coin, H2 = heads with probability 25%. Now suppose that the two hypotheses we actually want to test between are H0 and H' = 0.5(H1+H2). After seeing a single heads, the likelihood of H0 is 1 and the likelihood of H' is 0.5(0.5+0.25). After seeing two heads, the likelihood of H0 is 1 and the likelihood of H' is 0.5(0.5^2+0.25^2). In general, the likelihood of H' after n heads is 0.5(0.5^n+0.25^n), i.e. a mixture of multiple geometric functions. In general if H' is a mixture of many hypotheses, the likelihood will be a mixture of many geometric functions, and therefore could be more or less arbitrary.

Comment author: Eliezer_Yudkowsky 01 November 2012 04:43:15AM 1 point [-]

That's why I specified single possible worlds / hypotheses with no internal parameters that are being learned.

Comment author: jsteinhardt 01 November 2012 05:30:46AM 1 point [-]

Oops, missed that; but that specification doesn't hold in the situation we care about, since rejecting the null hypotheses typically requires us to consider the result of marginalizing over a space of alternative hypotheses (well, assuming we're being Bayesians, but I know you prefer that anyways =P).

Comment author: Eliezer_Yudkowsky 01 November 2012 07:05:19AM 1 point [-]

Well, right, assuming we're Bayesians, but when we're just "rejecting the null hypothesis" we should mostly be concerned about likelihood from the null hypothesis which has no moving parts, which is why I used the log approximation I did. But at this point we're mixing frequentism and Bayes to the point where I shan't defend the point further - it's certainly true that once a Bayesian considers more than exactly two atomic hypotheses, the update on two independent pieces of evidence doesn't go as the square of one update (even though the likelihood ratios still go as the square, etc.).