jsteinhardt comments on Original Research on Less Wrong - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (47)
If a given piece of evidence E1 provides Bayesian likelihood for theory T1 over T2, and E2 was generated by an isomorphic process, then we get the likelihood ratio squared, providing that T1 and T2 are single possible worlds and have no parameters being updated by E1 or E2 so that the probability of the evidence is conditionally independent.
Thus sayeth Bayes, so far as I can tell.
As for the frequentists...
Well, logically, we're allegedly rejecting a null hypothesis. If the "null hypothesis" contains no parameters to be updated and the probability that E1 was generated by the null hypothesis is .05, and E2 was generated by a causally conditionally independent process, the probability that E1+E2 was generated by the null hypothesis ought to be 0.0025.
But of course gwern's calculation came out differently in the decimals. This could be because some approximation truncated a decimal or two. But it could also be because frequentism actually calculates the probability that E1 is in some amazing class [E] of other data we could've observed but didn't, to be p < 0.05. Who knows what strange class of other data we could've seen but didn't, a given frequentist method will put E1 + E2 into? I mean, you can make up whatever the hell [E] you want, so who says you've got to make up one that makes [E+E] have the probability of [E] squared? So if E1 and E2 are exactly equally likely given the null hypothesis, a frequentist method could say that their combined "significance" is the square of E1, less than the square, more than the square, who knows, what the hell, if we obeyed probability theory we'd be Bayesians so let's just make stuff up. Sorry if I sound a bit polemical here.
See also: http://lesswrong.com/lw/1gc/frequentist_statistics_are_frequently_subjective/
Suppose that our data are coin flips, and consider three hypotheses: H0 = always heads, H1 = fair coin, H2 = heads with probability 25%. Now suppose that the two hypotheses we actually want to test between are H0 and H' = 0.5(H1+H2). After seeing a single heads, the likelihood of H0 is 1 and the likelihood of H' is 0.5(0.5+0.25). After seeing two heads, the likelihood of H0 is 1 and the likelihood of H' is 0.5(0.5^2+0.25^2). In general, the likelihood of H' after n heads is 0.5(0.5^n+0.25^n), i.e. a mixture of multiple geometric functions. In general if H' is a mixture of many hypotheses, the likelihood will be a mixture of many geometric functions, and therefore could be more or less arbitrary.
That's why I specified single possible worlds / hypotheses with no internal parameters that are being learned.
Oops, missed that; but that specification doesn't hold in the situation we care about, since rejecting the null hypotheses typically requires us to consider the result of marginalizing over a space of alternative hypotheses (well, assuming we're being Bayesians, but I know you prefer that anyways =P).
Well, right, assuming we're Bayesians, but when we're just "rejecting the null hypothesis" we should mostly be concerned about likelihood from the null hypothesis which has no moving parts, which is why I used the log approximation I did. But at this point we're mixing frequentism and Bayes to the point where I shan't defend the point further - it's certainly true that once a Bayesian considers more than exactly two atomic hypotheses, the update on two independent pieces of evidence doesn't go as the square of one update (even though the likelihood ratios still go as the square, etc.).