Eliezer_Yudkowsky comments on Bayesian Flame - Less Wrong
http://lesswrong.com/
Bayesian Flame
http://lesswrong.com/lw/147/bayesian_flame/
http://lesswrong.com/lw/147/bayesian_flame/Mon, 27 Jul 2009 02:49:51 +1000
Submitted by <a href="http://lesswrong.com/user/cousin_it">cousin_it</a>
•
37 votes
•
<a href="http://lesswrong.com/lw/147/bayesian_flame/#comments">155 comments</a>
<div><p>There once lived a great man named <a href="http://en.wikipedia.org/wiki/Edwin_Thompson_Jaynes">E.T. Jaynes</a>. He knew that Bayesian inference is <a href="http://www-biba.inrialpes.fr/Jaynes/prob.html">the only way</a> to do statistics logically and consistently, standing on the shoulders of misunderstood giants Laplace and Gibbs. On numerous occasions he <a href="http://bayes.wustl.edu/etj/articles/confidence.pdf">vanquished</a> traditional "frequentist" statisticians with his superior math, demonstrating to anyone with half a brain how the Bayesian way gives faster and more correct results in each example. The weight of evidence falls so heavily on one side that it makes no sense to argue anymore. The fight is over. Bayes wins. <a href="/lw/o7/searching_for_bayesstructure/">The universe runs on Bayes-structure.</a></p>
<p>Or at least that's what you believe if you learned this stuff from Overcoming Bias.</p>
<p>Like I was until two days ago, when Cyan <a href="/lw/13v/are_calibration_and_rational_decisions_mutually/">hit me over the head</a> with something utterly incomprehensible. I suddenly had to go out and understand this stuff, not just believe it. (The original intention, if I remember it correctly, was to impress you all by pulling a Jaynes.) Now I've come back and intend to provoke a full-on flame war on the topic. Because if we can have thoughtful flame wars about gender but not math, we're a bad community. Bad, bad community.</p>
<p>If you're like me two days ago, you kinda "understand" what Bayesians do: assume a prior probability distribution over hypotheses, use evidence to morph it into a posterior distribution over same, and bless the resulting numbers as your "degrees of belief". But chances are that you have a very vague idea of what frequentists do, apart from <a href="/lw/mt/beautiful_probability/">deriving half-assed results with their ad hoc tools</a>.<a id="more"></a></p>
<p>Well, here's the ultra-short version: frequentist statistics is <em>the art of drawing true conclusions about the real world</em> instead of assuming prior degrees of belief and coherently adjusting them to avoid Dutch books.</p>
<p>And here's an ultra-short example of what frequentists can do: estimate 100 independent unknown parameters from 100 different sample data sets and have 90 of the estimates turn out to be <em>true to fact</em> afterward. Like, fo'real. Always 90% in the long run, truly, irrevocably and forever. No Bayesian method known today can reliably do the same: the outcome will depend on the priors you assume for each parameter. I don't believe you're going to get lucky with all 100. And even if I believed you a priori (ahem) that don't make it true.</p>
<p>(That's what Jaynes did to achieve his awesome victories: use trained intuition to pick good priors by hand on a per-sample basis. Maybe you can learn this skill somewhere, but not from the <a href="http://yudkowsky.net/rational/bayes">Intuitive Explanation</a>.)</p>
<p>How in the world do you do inference without a prior? Well, the characterization of frequentist statistics as "trickery" is totally justified: it has no single coherent approach and the tricks often give conflicting results. Most everybody agrees that you can't do better than Bayes if you have a clear-cut prior; but if you don't, no one is going to kick you out. We sympathize with your predicament and will gladly sell you some twisted technology!</p>
<p><a href="http://en.wikipedia.org/wiki/Confidence_interval">Confidence intervals</a>: imagine you somehow process some sample data to get an interval. Further imagine that hypothetically, <em>for any given hidden parameter value</em>, this calculation algorithm applied to data sampled under that parameter value yields an interval that covers it with probability 90%. Believe it or not, this perverse trick works 90% of the time without requiring any prior distribution on parameter values.</p>
<p><a href="http://en.wikipedia.org/wiki/Bias_of_an_estimator">Unbiased estimators</a>: you process the sample data to get a number whose expectation magically coincides with the true parameter value.</p>
<p><a href="http://en.wikipedia.org/wiki/Statistical_hypothesis_testing">Hypothesis testing</a>: I give you a black-box random distribution and claim it obeys a specified formula. You sample some data from the box and inspect it. Frequentism allows you to <span style="text-decoration: line-through;">call me a liar and be wrong no more than 10% of the time</span> reject truthful claims no more than 10% of the time, guaranteed, no prior in sight. (Thanks Eliezer for calling out the mistake, and conchis for the correction!)</p>
<p>But this is getting too academic. I ought to throw you dry wood, good flame material. This <a href="http://ba.stat.cmu.edu/journal/2008/vol03/issue03/gelman.pdf">hilarious PDF</a> from Andrew Gelman should do the trick. Choice quote:</p>
<blockquote>
<p>Well, let me tell you something. The 50 states aren't exchangeable. I've lived in a few of them and visited nearly all the others, and calling them exchangeable is just silly. Calling it a hierarchical or multilevel model doesn't change things - it's an additional level of modeling that I'd rather not do. Call me old-fashioned, but I'd rather let the data speak without applying a probability distribution to something like the 50 states which are neither random nor a sample.</p>
</blockquote>
<p>As a bonus, the bibliography to that article contains such marvelous titles as "Why Isn't Everyone a Bayesian?" And Larry Wasserman's <a href="http://ba.stat.cmu.edu/journal/2008/vol03/issue03/wasserman.pdf">followup</a> is also quite disturbing.</p>
<p>Another stick for the fire is provided by <a href="http://cscs.umich.edu/~crshalizi/weblog/612.html">Shalizi</a>, who (among other things) makes the correct point that a good Bayesian must never be uncertain about the probability of any future event. That's why he calls Bayesians "Often Wrong, Never In Doubt":</p>
<blockquote>
<p>The Bayesian, by definition, believes in a joint distribution of the random sequence X and of the hypothesis M. (Otherwise, Bayes's rule makes no sense.) This means that by integrating over M, we get an unconditional, marginal probability for f.</p>
</blockquote>
<p>For my final quote it seems only fair to add one more polemical summary of Cyan's point that made me sit up and look around in a bewildered manner. Credit to Wasserman <a href="http://ba.stat.cmu.edu/journal/2006/vol01/issue03/wasserman.pdf">again</a>:</p>
<blockquote>
<p><em>Pennypacker:</em> You see, physics has really advanced. All those quantities I estimated have now been measured to great precision. Of those thousands of 95 percent intervals, only 3 percent contained the true values! They concluded I was a fraud.</p>
<p><em>van Nostrand</em><em>:</em> Pennypacker you fool. I never said those intervals would contain the truth 95 percent of the time. I guaranteed coherence not coverage!</p>
<p><em>Pennypacker:</em> A lot of good that did me. I should have gone to that objective Bayesian statistician. At least he cares about the frequentist properties of his procedures.</p>
<p><em>van Nostrand:</em> Well I'm sorry you feel that way Pennypacker. But I can't be responsible for your incoherent colleagues. I've had enough now. Be on your way.</p>
</blockquote>
<p>There's often good reason to advocate a correct theory over a wrong one. But all this evidence (ahem) shows that switching to <a href="/lw/lz/guardians_of_the_truth/">Guardian of Truth</a> mode was, at the very least, premature for me. Bayes isn't the correct theory to make conclusions about the world. <em>As of today, we have no coherent theory for making conclusions about the world.</em> Both perspectives have serious problems. So do yourself a favor and switch to truth-seeker mode.</p></div>
<a href="http://lesswrong.com/lw/147/bayesian_flame/#comments">155 comments</a>
Eliezer_Yudkowsky on Bayesian Flame
http://lesswrong.com/lw/147/bayesian_flame/zen
http://lesswrong.com/lw/147/bayesian_flame/zen2009-07-27T03:39:09.255880+10:00
<div class="md"><blockquote>
<p>a good Bayesian must never be uncertain about the probability of any future event</p>
</blockquote>
<p>Who? Whaa? Your probability <em>is</em> your uncertainty.</p></div>
orthonormal on Bayesian Flame
http://lesswrong.com/lw/147/bayesian_flame/zfh
http://lesswrong.com/lw/147/bayesian_flame/zfh2009-07-27T06:21:36.594803+10:00
<div class="md"><p>Also, didn't we already cover <a href="http://lesswrong.com/lw/9x/metauncertainty/">metauncertainty</a> <a href="http://lesswrong.com/lw/em/bead_jar_guesses/">here</a>?</p></div>
Nick_Tarleton on Bayesian Flame
http://lesswrong.com/lw/147/bayesian_flame/zfm
http://lesswrong.com/lw/147/bayesian_flame/zfm2009-07-27T06:29:33.938005+10:00
<div class="md"><p><a href="http://cscs.umich.edu/~crshalizi/weblog/612.html" rel="nofollow">Shalizi</a> says "Bayesian agents never have the kind of uncertainty that Rebonato (sensibly) thinks people in finance should have". My guess is that this means (something that could be described as) uncertainty as to how well-calibrated one is, which AFAIK hasn't been explicitly covered here.</p></div>
Cyan on Bayesian Flame
http://lesswrong.com/lw/147/bayesian_flame/zfu
http://lesswrong.com/lw/147/bayesian_flame/zfu2009-07-27T07:24:53.587052+10:00
<div class="md"><p>Yup. Shalizi's point is that once you've taken meta-uncertainty into account (by <a href="http://en.wikipedia.org/wiki/Marginal_distribution" rel="nofollow">marginalizing</a> over it), you have a precise and specific probability distribution over outcomes.</p></div>
Eliezer_Yudkowsky on Bayesian Flame
http://lesswrong.com/lw/147/bayesian_flame/zfz
http://lesswrong.com/lw/147/bayesian_flame/zfz2009-07-27T07:36:14.873877+10:00
<div class="md"><p>Well, yes. You have to bet at some odds. You're in some particular state of uncertainty and not a different one. I suppose the game is to make people think that being in some particular state of uncertainty, corresponds to claiming to know too much about the problem? The <em>ignorance</em> is shown in the <em>instability</em> of the estimate - the way it reacts strongly to new evidence.</p></div>
Cyan on Bayesian Flame
http://lesswrong.com/lw/147/bayesian_flame/zgi
http://lesswrong.com/lw/147/bayesian_flame/zgi2009-07-27T08:35:19.577773+10:00
<div class="md"><p>I'm with you on this one. What Shalizi is criticizing is essentially a consequence of the desideratum that a single real number shall represent the plausibility of an event. I don't think the methods he's advocating dispense with the desideratum, so I view this as a delicious <a href="http://scottaaronson.com/blog/?p=326" rel="nofollow">bullet</a>-shaped candy that he's convinced is a real bullet and is attempting to dodge.</p></div>
marks on Bayesian Flame
http://lesswrong.com/lw/147/bayesian_flame/zm6
http://lesswrong.com/lw/147/bayesian_flame/zm62009-07-28T17:06:50.406565+10:00
<div class="md"><p>I think what Shalizi means is that a Bayesian model is never "wrong", in the sense that it is a true description of the current state of the ideal Bayesian agent's knowledge. I.e., if A says an event X has probability p, and B says X has probability q, then they aren't lying even if p!=q. And the ideal Bayesian agent updates that knowledge perfectly by Bayes' rule (where knowledge is defined as probability distributions of states of the world). In this case, if A and B talk with each other then they should probably update, of course.</p>
<p>In frequentist statistics the paradigm is that one searches for the 'true' model by looking through a space of 'false' models. In this case if A says X has probability p and B says X has probability q != p then at least one of them is wrong.</p></div>