<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/">
<channel>
<title>
Articles Tagged ‘probability’ - Less Wrong
</title> <link>http://lesswrong.com/</link>
<description></description>
<item>
<title>Anticipating critical transitions</title>
<link>http://lesswrong.com/lw/hoc/anticipating_critical_transitions/</link>
<guid isPermaLink="true">http://lesswrong.com/lw/hoc/anticipating_critical_transitions/</guid>
<pubDate>Sun, 09 Jun 2013 16:28:51 +0000</pubDate>
<description>
Submitted by &lt;a href="http://lesswrong.com/user/PhilGoetz"&gt;PhilGoetz&lt;/a&gt;
&amp;bull;
16 votes
&amp;bull;
&lt;a href="http://lesswrong.com/lw/hoc/anticipating_critical_transitions/#comments"&gt;51 comments&lt;/a&gt;
&lt;div&gt;&lt;p&gt;(Mathematicians may find this post painfully obvious.)&lt;/p&gt;
&lt;p&gt;I read an interesting &lt;a href=&quot;http://www.thebigquestions.com/2010/12/21/are-you-smarter-than-google/&quot;&gt;puzzle&lt;/a&gt;&amp;#xA0;on Stephen Landsburg's blog that generated a lot of disagreement. Stephen offered to bet anyone $15,000 that the average results of a computer simulation, run 1 million times, would be close to his solution's prediction of the expected value.&lt;/p&gt;
&lt;p&gt;Landsburg's solution is in fact correct. But the problem involves a probabilistic infinite series, a kind used often on less wrong in a context where one is offered some utility every time one flips a coin and it comes up heads, but loses everything if it ever comes up tails. Landsburg didn't justify the claim that a simulation could indicate the true expected outcome of this particular problem. Can we find similar-looking problems for which simulations give the wrong answer? &amp;#xA0;Yes.&lt;/p&gt;
&lt;p&gt;&lt;a id=&quot;more&quot;&gt;&lt;/a&gt;Here's Perl code to estimate by simulation the expected value of the series of terms 2^k / k from k = 1 to infinity, with a 50% chance of stopping after each term.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;my $bigsum = 0;
for (my $trial = 0; $trial &amp;lt; 1000000; $trial++) {
&amp;#xA0; &amp;#xA0; my $sum = 0;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&amp;#xA0; &amp;#xA0; &lt;/code&gt;my $top = 2;
&lt;code&gt;&lt;/code&gt;&lt;code&gt;&amp;#xA0; &amp;#xA0; &lt;/code&gt;my $denom = 1;
&lt;code&gt;&lt;/code&gt;&lt;code&gt;&amp;#xA0; &amp;#xA0; &lt;/code&gt;do {
&lt;code&gt;&lt;/code&gt;&lt;code&gt;&amp;#xA0; &amp;#xA0; &lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&amp;#xA0; &amp;#xA0; &lt;/code&gt;$sum += $top / $denom;
&lt;code&gt;&lt;/code&gt;&lt;code&gt;&amp;#xA0; &amp;#xA0; &lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&amp;#xA0; &amp;#xA0; &lt;/code&gt;$top *= 2;
&lt;code&gt;&lt;/code&gt;&lt;code&gt;&amp;#xA0; &amp;#xA0; &lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&amp;#xA0; &amp;#xA0; &lt;/code&gt;$denom += 1;
&lt;code&gt;&lt;/code&gt;&lt;code&gt;&amp;#xA0; &amp;#xA0; &lt;/code&gt;}
&lt;code&gt;&lt;/code&gt;&lt;code&gt;&amp;#xA0; &amp;#xA0; &lt;/code&gt;while (rand(1) &amp;lt; .5);
&lt;code&gt;&lt;/code&gt;&lt;code&gt;&amp;#xA0; &amp;#xA0; &lt;/code&gt;$bigsum += $sum;
}
my $ave = $bigsum / $runs;
print &quot;ave sum=$ave\n&quot;;
&lt;/pre&gt;
&lt;p&gt;(If anyone knows how to enter a code block on this site, let me know. I used the &quot;pre&quot; tag, but the site stripped out my spaces anyway.)&lt;/p&gt;
&lt;p&gt;Running it 5 times, we get the answers&lt;/p&gt;
&lt;p&gt;ave sum=7.6035709716983&lt;/p&gt;
&lt;p&gt;ave sum=8.47543819631431&lt;/p&gt;
&lt;p&gt;ave sum=7.2618950097739&lt;/p&gt;
&lt;p&gt;ave sum=8.26159741956747&lt;/p&gt;
&lt;p&gt;ave sum=7.75774577340324&lt;/p&gt;
&lt;p&gt;&amp;#xA0;&lt;/p&gt;
&lt;p&gt;So the expected value is somewhere around 8?&lt;/p&gt;
&lt;p&gt;No; the expected value is given by the sum of the harmonic series, which diverges, so it's infinite. Later terms in the series are exponentially larger, but exponentially less likely to appear.&lt;/p&gt;
&lt;p&gt;Some of you are saying, &quot;Of course the expected value of a divergent series can't be computed by simulation! Give me back my minute!&quot; But many things we might simulate with computers, like the weather, the economy, or existential risk, are full of power law distributions that might not have a convergent expected value. People have observed before that this can cause problems for simulations (see &lt;em&gt;&lt;a href=&quot;http://amzn.to/111n0QV&quot;&gt;The Black Swan&lt;/a&gt;&lt;/em&gt;). What I find interesting is that the output of the program above doesn't look like something inside it diverges. It looks almost normal. So you could run your simulation many times and believe that you had a grip on its expected outcome, yet be completely mistaken.&lt;/p&gt;
&lt;p&gt;In real-life simulations (that sounds wrong, doesn't it?), there's often some system property that drifts slowly, and some critical value of that system property above which some distribution within the simulation diverges. Moving above that critical value doesn't suddenly change the output of the simulation in a way that gives an obvious warning. But the expected value of keeping that property below that critical value in the real-life system being simulated can be very high (or even infinite), with very little cost.&lt;/p&gt;
&lt;p&gt;Is there a way to look at a simulation's outputs, and guess whether a particular property is near some such critical threshold? &amp;#xA0;Better yet, is there a way to guess whether there exists some property in the system nearing some such threshold, even if you don't know what it is?&lt;/p&gt;
&lt;p&gt;The October 19, 2012 issue of Science contains an article on just that question: &quot;Anticipating critical transitions&quot;, Marten Scheffer et al., p. 344. It reviews 28 papers on systems and simulations, and lists about a dozen mathematical approaches used to estimate nearness to a critical point. These include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Critical slowing down: When the system is near a critical threshold, it recovers slowly from small perturbations. One measure of this is autocorrelation at lag 1, meaning the correlation between the system's output at times T and T-1. Counterintuitively, a higher autocorrelation at lag one by itself suggests that the system is more predictable than before, but may actually indicate it is less predictable. The more predictable system reverts to its mean; the unpredictable system has no mean.&lt;/li&gt;
&lt;li&gt;Flicker: Instead of having a single stable state that the system reverts to after perturbation, an additional stable state appears, and the system flickers back and forth between the two states.&lt;/li&gt;
&lt;li&gt;Dominant eigenvalue: I haven't read the paper that explains what this paper means when it cites this, but I do know that you can predict when a helicopter engine is going to malfunction by putting many sensors on it,&amp;#xA0;running PCA on time-series data for those sensors to get a matrix that projects their output into just a few dimensions,&amp;#xA0;then reading their output continuously and predicting failure anytime the PCA-projected output vector moves a lot. That probably is what they mean.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So if you're modeling global warming, running your simulation a dozen times and averaging the results may be misleading. [1] Global temperature has sudden [2] dramatic transitions, and an exceptionally large and sudden one (15C in one million years) neatly spans the Earth's greatest extinction event so far on the Permian-Triassic boundary [3]. It's more important to figure out what the critical parameter is and where its critical point is than to try and estimate how many years it will be before Manhattan is underwater. The &quot;expected rise in water level per year&quot; may not be easily-answerable by simulation [4].&lt;/p&gt;
&lt;p&gt;And if you're thinking about betting Stephen Landsburg $15,000 on the outcome of a simulation, make sure his series converges first. [5]&lt;/p&gt;
&lt;p&gt;&amp;#xA0;&lt;/p&gt;
&lt;p&gt;[1] Not that I'm particularly worried about global warming.&lt;/p&gt;
&lt;p&gt;[2] Geologically sudden.&lt;/p&gt;
&lt;p&gt;[3] Sun et al., &quot;Lethally hot temperatures during the early Triassic greenhouse&quot;, Science 338 (Oct. 19 2012) p.366, see p. 368.&amp;#xA0;Having just pointed out that an increase of .000015C/yr counts as a &quot;sudden&quot; global warming event, I feel obligated to also point out that the current increase is about .02C/yr.&lt;/p&gt;
&lt;p&gt;[4] It will be answerable by simulation, since rise in water level can't be infinite. But you may need a lot more simulations than you think.&lt;/p&gt;
&lt;p&gt;[5] Better yet, don't bet against Stephen Landsburg.&lt;/p&gt;&lt;/div&gt;
&lt;a href="http://lesswrong.com/lw/hoc/anticipating_critical_transitions/#comments"&gt;51 comments&lt;/a&gt;
</description>
</item>
<item>
<title>Reflection in Probabilistic Logic</title>
<link>http://lesswrong.com/lw/h1k/reflection_in_probabilistic_logic/</link>
<guid isPermaLink="true">http://lesswrong.com/lw/h1k/reflection_in_probabilistic_logic/</guid>
<pubDate>Mon, 25 Mar 2013 03:37:36 +1100</pubDate>
<description>
Submitted by &lt;a href="http://lesswrong.com/user/Eliezer_Yudkowsky"&gt;Eliezer_Yudkowsky&lt;/a&gt;
&amp;bull;
61 votes
&amp;bull;
&lt;a href="http://lesswrong.com/lw/h1k/reflection_in_probabilistic_logic/#comments"&gt;166 comments&lt;/a&gt;
&lt;div&gt;&lt;p&gt;Paul Christiano has devised&amp;#xA0;&lt;a href=&quot;http://intelligence.org/wp-content/uploads/2013/03/Christiano-et-al-Naturalistic-reflection-early-draft.pdf&quot;&gt;&lt;strong&gt;a new fundamental approach&lt;/strong&gt;&lt;/a&gt;&amp;#xA0;to the &quot;&lt;a href=&quot;https://www.youtube.com/watch?v=MwriJqBZyoM&quot;&gt;L&amp;#xF6;b Problem&lt;/a&gt;&quot; wherein &lt;a href=&quot;/lw/t6/the_cartoon_guide_to_l%C3%B6bs_theorem/&quot;&gt;L&amp;#xF6;b's Theorem&lt;/a&gt; seems to pose an obstacle to AIs building successor AIs, or adopting successor versions of their own code, that trust the same amount of mathematics as the original. &amp;#xA0;(I am currently writing up a more thorough description of the &lt;em&gt;question &lt;/em&gt;this preliminary technical report is working on answering. &amp;#xA0;For now the main online description is in a&amp;#xA0;&lt;a href=&quot;https://www.youtube.com/watch?v=MwriJqBZyoM&quot;&gt;quick Summit talk&lt;/a&gt;&amp;#xA0;I gave. &amp;#xA0;See also Benja Fallenstein's description of the problem in the course of presenting a&amp;#xA0;&lt;a href=&quot;/lw/e4e/an_angle_of_attack_on_open_problem_1/&quot;&gt;different angle of attack&lt;/a&gt;. &amp;#xA0;Roughly the problem is that mathematical systems can only prove the soundness of, aka 'trust', weaker mathematical systems. &amp;#xA0;If you try to write out an exact description of how AIs would build their successors or successor versions of their code in the most obvious way, it looks like the mathematical strength of the proof system would tend to be stepped down each time, which is undesirable.)&lt;/p&gt;
&lt;p&gt;Paul Christiano's approach is inspired by the idea that whereof one cannot prove or disprove, thereof one must assign probabilities: and that although no mathematical system can contain its own&amp;#xA0;&lt;em&gt;truth&lt;/em&gt;&amp;#xA0;predicate, a mathematical system might be able to contain a reflectively consistent&amp;#xA0;&lt;em&gt;probability&lt;/em&gt;&amp;#xA0;predicate. &amp;#xA0;In particular, it looks like we can have:&lt;/p&gt;
&lt;p&gt;&amp;#x2200;a, b:&lt;span style=&quot;white-space: pre;&quot;&gt; &lt;/span&gt;(a &amp;lt; P(&amp;#x3C6;) &amp;lt; b) &amp;#xA0; &amp;#xA0; &amp;#xA0; &amp;#xA0; &amp;#xA0;&amp;#x21D2; &amp;#xA0;P(a &amp;lt; P('&amp;#x3C6;') &amp;lt; b) = 1&lt;br&gt;&amp;#x2200;a, b:&lt;span style=&quot;white-space: pre;&quot;&gt; &lt;/span&gt;P(a &amp;#x2264; P('&amp;#x3C6;')&amp;#xA0;&amp;#x2264;&amp;#xA0;b) &amp;gt; 0 &amp;#xA0;&amp;#x21D2; &amp;#xA0;a&amp;#xA0;&amp;#x2264;&amp;#xA0;P(&amp;#x3C6;)&amp;#xA0;&amp;#x2264;&amp;#xA0;b&lt;/p&gt;
&lt;p&gt;Suppose I present you with the human and probabilistic version of a G&amp;#xF6;del sentence, the&amp;#xA0;&lt;a href=&quot;http://books.google.com/books?id=cmX8yyBfP74C&amp;amp;pg=PA317&amp;amp;lpg=PA317&amp;amp;dq=whitely+lucas+cannot+consistently&amp;amp;source=bl&amp;amp;ots=68tuximFfI&amp;amp;sig=GdZro1wy6g_KzO-PXInGTKFrU7Q&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=7-FMUb61LojRiAK9hIGQDw&amp;amp;ved=0CGoQ6AEwBg#v=onepage&amp;amp;q=whitely%20lucas%20cannot%20consistently&amp;amp;f=false&quot;&gt;Whitely sentence&lt;/a&gt;&amp;#xA0;&quot;You assign this statement a probability less than 30%.&quot; &amp;#xA0;If you disbelieve this statement, it is true. &amp;#xA0;If you believe it, it is false. &amp;#xA0;If you assign 30% probability to it, it is false. &amp;#xA0;If you assign 29% probability to it, it is true.&lt;/p&gt;
&lt;p&gt;Paul's approach resolves this problem by restricting your belief about your own probability assignment to within epsilon of 30% for any epsilon. &amp;#xA0;So Paul's approach replies, &quot;Well, I assign&amp;#xA0;&lt;em&gt;almost&lt;/em&gt;&amp;#xA0;exactly 30% probability to that statement - maybe a little more, maybe a little less - in fact I think there's about a 30% chance that I'm a tiny bit under 0.3 probability and a 70% chance that I'm a tiny bit over 0.3 probability.&quot; &amp;#xA0;A standard fixed-point theorem then implies that a consistent assignment like this should exist. &amp;#xA0;If asked if the probability is over 0.2999 or under 0.30001 you will reply with a definite yes.&lt;a id=&quot;more&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We haven't yet worked out a walkthrough showing if/how this solves the L&amp;#xF6;b obstacle to self-modification, and the probabilistic theory itself is nonconstructive (we've shown that something like this should exist, but not how to compute it). &amp;#xA0;Even so, a possible fundamental triumph over Tarski's theorem on the undefinability of truth and a number of standard G&amp;#xF6;delian limitations is important news as math&amp;#xA0;&lt;em&gt;qua&lt;/em&gt;&amp;#xA0;math, though work here is still in very preliminary stages. &amp;#xA0;There are even whispers of unrestricted comprehension in a probabilistic version of set theory with&amp;#xA0;&amp;#x2200;&amp;#x3C6;: &amp;#x2203;S: P(x &amp;#x2208; S) = P(&amp;#x3C6;(x)), though this part is not in the preliminary report and is at even earlier stages and could easily not work out at all.&lt;/p&gt;
&lt;p&gt;It seems important to remark on how this result was developed: &amp;#xA0;Paul Christiano showed up with the idea (of consistent probabilistic reflection via a fixed-point theorem) to a week-long &quot;math squad&quot; (aka MIRI Workshop) with Marcello Herreshoff, Mihaly Barasz, and myself; then we all spent the next week proving that version after version of Paul's idea couldn't work or wouldn't yield self-modifying AI; until finally, a day after the workshop was supposed to end, it produced something that looked like it might work. &amp;#xA0;If we hadn't been trying to &lt;em&gt;solve &lt;/em&gt;this problem (with hope stemming from how it seemed like the sort of thing a reflective rational agent ought&amp;#xA0;to be able to do somehow), this would be just another batch of impossibility results in the math literature. &amp;#xA0;I remark on this because it may help demonstrate that Friendly AI is a productive approach to math&amp;#xA0;&lt;em&gt;qua&amp;#xA0;&lt;/em&gt;math, which may aid some mathematician in becoming interested.&lt;/p&gt;
&lt;p&gt;I further note that this does not mean the L&amp;#xF6;bian obstacle is resolved and no further work is required. &amp;#xA0;Before we can conclude that we need a computably specified version of the theory plus a walkthrough for a self-modifying agent using it.&lt;/p&gt;
&lt;p&gt;See also the&amp;#xA0;&lt;a href=&quot;http://intelligence.org/2013/03/22/early-draft-of-naturalistic-reflection-paper/&quot;&gt;blog post&lt;/a&gt;&amp;#xA0;on the MIRI site (and subscribe to MIRI's newsletter&amp;#xA0;&lt;a href=&quot;http://intelligence.org/&quot;&gt;here&lt;/a&gt;&amp;#xA0;to keep abreast of research updates).&lt;/p&gt;
&lt;p&gt;This LW post is the preferred place for feedback on the &lt;a href=&quot;http://intelligence.org/wp-content/uploads/2013/03/Christiano-et-al-Naturalistic-reflection-early-draft.pdf&quot;&gt;paper&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;EDIT: &amp;#xA0;But see discussion on a Google+ post by John Baez &lt;a href=&quot;https://plus.google.com/117663015413546257905/posts/jJModdTJ2R3?hl=en&quot;&gt;here&lt;/a&gt;. &amp;#xA0;Also see&amp;#xA0;&lt;a href=&quot;http://wiki.lesswrong.com/wiki/Comment_formatting#Using_LaTeX_to_render_mathematics&quot;&gt;here&lt;/a&gt;&amp;#xA0;for how to display math LaTeX in comments.&lt;/p&gt;&lt;/div&gt;
&lt;a href="http://lesswrong.com/lw/h1k/reflection_in_probabilistic_logic/#comments"&gt;166 comments&lt;/a&gt;
</description>
</item>
<item>
<title>Case Study: the Death Note Script and Bayes</title>
<link>http://lesswrong.com/lw/f63/case_study_the_death_note_script_and_bayes/</link>
<guid isPermaLink="true">http://lesswrong.com/lw/f63/case_study_the_death_note_script_and_bayes/</guid>
<pubDate>Fri, 04 Jan 2013 15:33:37 +1100</pubDate>
<description>
Submitted by &lt;a href="http://lesswrong.com/user/gwern"&gt;gwern&lt;/a&gt;
&amp;bull;
24 votes
&amp;bull;
&lt;a href="http://lesswrong.com/lw/f63/case_study_the_death_note_script_and_bayes/#comments"&gt;43 comments&lt;/a&gt;
&lt;div&gt;&lt;p&gt;&lt;a href=&quot;http://www.gwern.net/Death%20Note%20script&quot;&gt;&quot;Who wrote the &lt;em&gt;Death Note &lt;/em&gt;script?&quot;&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I give a history of the 2009 leaked script, discuss internal &amp;amp; external evidence for its authenticity including stylometrics; and then give a simple step-by-step Bayesian analysis of each point. We finish with high confidence in the script's authenticity, discussion of how this analysis was surprisingly enlightening, and what followup work the analysis suggests would be most valuable.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;a id=&quot;more&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you're already familiar this particular leaked 2009 live-action script, please write down your current best guess as to how likely it is to be authentic.&lt;/p&gt;
&lt;p&gt;This is intended to be easy to understand and essentially beginner-level for Bayes's theorem and fermi estimates, like my other &lt;a href=&quot;/lw/5ld/death_note_anonymity_and_information_theory/&quot;&gt;&lt;em&gt;Death Note&lt;/em&gt; essay&lt;/a&gt; (information theory, crypto) or my &lt;a href=&quot;/lw/4or/case_study_console_insurance/&quot;&gt;console insurance&lt;/a&gt; page (efficient markets, positive psychology, expected value).&lt;/p&gt;
&lt;p&gt;Be sure to check out the controversial twist ending!&lt;/p&gt;
&lt;p&gt;(I'm sorry to post just a link, but I briefly thought about writing it and all the math in the LW edit box and decided that cutting my wrists sounded both quicker and more enjoyable. Unfortunately, there seems to be a math problem in the Google Chrome/Chromium browser where fractions simply don't render, &lt;a href=&quot;https://code.google.com/p/chromium/issues/detail?id=6606&quot;&gt;apparently&lt;/a&gt; &lt;a href=&quot;https://code.google.com/p/chromium/issues/detail?id=152430&quot;&gt;due&lt;/a&gt; to not enabling Webkit's MathML code; if fractions don't render for you, well, I know the math works well in my Iceweasel and it seems to work well in other Firefoxes.)&lt;/p&gt;&lt;/div&gt;
&lt;a href="http://lesswrong.com/lw/f63/case_study_the_death_note_script_and_bayes/#comments"&gt;43 comments&lt;/a&gt;
</description>
</item>
<item>
<title>Solving the two envelopes problem</title>
<link>http://lesswrong.com/lw/dy9/solving_the_two_envelopes_problem/</link>
<guid isPermaLink="true">http://lesswrong.com/lw/dy9/solving_the_two_envelopes_problem/</guid>
<pubDate>Thu, 09 Aug 2012 23:42:19 +1000</pubDate>
<description>
Submitted by &lt;a href="http://lesswrong.com/user/rstarkov"&gt;rstarkov&lt;/a&gt;
&amp;bull;
27 votes
&amp;bull;
&lt;a href="http://lesswrong.com/lw/dy9/solving_the_two_envelopes_problem/#comments"&gt;31 comments&lt;/a&gt;
&lt;div&gt;&lt;p&gt;Suppose you are presented with a game. You are given a red and a blue envelope with some money in each. You are allowed to ask an independent party to open both envelopes, and tell you the ratio of blue:red amounts (but not the actual amounts). If you do, the game master replaces the envelopes, and the amounts inside are chosen by him using the same algorithm as before.&lt;/p&gt;
&lt;p&gt;You ask the independent observer to check the amounts a million times, and find that half the time the ratio is 2 (blue has twice as much as red), and half the time it's 0.5 (red has twice as much as blue). At this point, the game master discloses that in fact, the way he chooses the amounts mathematically guarantees that these probabilities hold.&lt;/p&gt;
&lt;p&gt;Which envelope should you pick to maximize your expected wealth?&lt;/p&gt;
&lt;p&gt;It may seem surprising, but with this set-up, the game master can choose to make either red or blue have a higher expected amount of money in it, or make the two the same. Asking the independent party as described above will not help you establish which is which. This is the surprising part and is, in my opinion, the crux of the two envelopes problem.&lt;/p&gt;
&lt;p&gt;&lt;a id=&quot;more&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This is not quite how the &lt;a href=&quot;http://en.wikipedia.org/wiki/Two_envelopes_problem&quot;&gt;two envelopes problem&lt;/a&gt; is usually presented, but this is the presentation I arrived at after contemplating the original puzzle. The original puzzle prescribes a specific strategy that the game master follows, makes the envelopes indistinguishable, and provides a paradoxical argument which is obviously false, but it's not so obvious where it goes wrong.&lt;/p&gt;
&lt;p&gt;Note that for simplicity, let's assume that money is a real quantity and can be subdivided indefinitely. This avoids the problem of odd amounts like $1.03 not being exactly divisible by two.&lt;/p&gt;
&lt;h3&gt;The flawed argument&lt;/h3&gt;
&lt;p&gt;The flawed argument goes as follows. Let's call the amount in the blue envelope B, and in red R. You have confirmed that half the time, B is equal to 2R, and half the time it's R/2. This is a fact. Surely then the expected value of B is (2R * 50% + R/2 * 50%), which simplifies to 1.25R. In other words, the blue envelope has a higher expected amount of money given the evidence we have.&lt;/p&gt;
&lt;p&gt;But notice that the situation is completely symmetric. From the information you have, it's also obvious that half the time, R is 2B, and half the time it's B/2. So by the same argument the expected value of R is 1.25B. Uh-oh. The expected value of both envelopes is higher than the other?...&lt;/p&gt;
&lt;h3&gt;Game master strategies&lt;/h3&gt;
&lt;p&gt;Let's muddy up the water a little by considering the strategies the game master can use to pick the amounts for each envelope.&lt;/p&gt;
&lt;h5&gt;Strategy 1&lt;/h5&gt;
&lt;p&gt;Pick an amount X between $1 and $1000 randomly. Throw a fair die. If you get an odd number, put X into the red envelope and 2X into blue. Otherwise put X into blue and 2X into red.&lt;/p&gt;
&lt;h5&gt;Strategy 2&lt;/h5&gt;
&lt;p&gt;Pick an amount X between $1 and $1000 randomly. Put this into the red envelope. Throw a fair die. If you get an odd number, put 2X into blue, and if it's even, put X/2 into blue.&lt;/p&gt;
&lt;p&gt;The difference between these strategies is fairly subtle. I hope it's sufficiently obvious that the &lt;em&gt;&quot;ratio condition&quot;&lt;/em&gt; (B = 2R half the time and R = 2B the other half) is true for both strategies. However, suppose we have two people take part in this game, one always picking the red envelope and the other always picking the blue envelope. After a million repetitions of this game, with the first strategy, the two guys will have won almost exactly the same amounts in total. After a million repetitions with the second strategy, the total amount won by the blue guy will be &lt;em&gt;25% higher&lt;/em&gt; than the total amount won by the red guy!&lt;/p&gt;
&lt;p&gt;Now observe that strategy 2 can be trivially inverted to favour the red envelope instead of the blue one. The player can ask an independent observer for ratios (as described in the introduction) all he wants, but this information will not allow him to distinguish between these three scenarios (strategy 1, strategy 2 and strategy 2 inverted). It's obviously impossible to figure out which envelope has a higher expected winnings from this information!&lt;/p&gt;
&lt;h3&gt;What's going on here?&lt;/h3&gt;
&lt;p&gt;I hope I've convinced you by now that the information about the likelihood of the ratios does not tell you which envelope is better. But what &lt;em&gt;exactly&lt;/em&gt; is the flaw in the original argument?&lt;/p&gt;
&lt;p&gt;Let's formalize the puzzle a bit. We have two random variables, R and B. We are permitted to ask someone to sample each one and compute the ratio of the samples, r/b, and disclose it to us. Let's define a random variable called RB whose samples are produced by sampling R and B and computing their ratio. We know that RB can take two values, 2 and 0.5, with equal probability. Let's also define BR, which is the opposite ratio: that of a sample of B to a sample of R. BR can also take two values, 2 and 0.5, with equal probability.&lt;/p&gt;
&lt;p&gt;The flawed argument is simply that the expected value of RB, E(RB), is 1.25, which is greater than 1, and therefore E(R) &amp;gt; E(B). The flawed argument continues that E(BR) is 1.25 too, therefore E(B) &amp;gt; E(R), leading to a contradiction. What's the flaw?&lt;/p&gt;
&lt;h3&gt;Solution&lt;/h3&gt;
&lt;p&gt;The expected value of RB, E(RB), really is 1.25. The puzzle gets that one right. E(BR) is &lt;em&gt;also&lt;/em&gt; 1.25. The flaw in the argument is simply that it assumes E(X/Y) &amp;gt; 1 implies that E(X) &amp;gt; E(Y). This implication seems to hold intuitively, but human intuition is notoriously bad at probabilities. It is easy to prove that this implication is false, by considering a simple counter-example courtesy of &lt;span class=&quot;comment-author&quot;&gt;&lt;span class=&quot;author&quot;&gt;&lt;a href=&quot;/user/VincentYu/&quot; id=&quot;author_t1_75dh&quot;&gt;VincentYu&lt;/a&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;comment-author&quot;&gt;&lt;span class=&quot;author&quot;&gt;Consider two independent random variables, X and Y. X can take values 20 and 60, while Y can take values 2 and 100, both with equal probability. To calculate the expected value of X/Y, one can enumerate all possible combinations, multiplying each by its probability. The four possible combinations of X and Y are 20/2, 20/100, 60/2 and 60/100. Each combination is 25% likely. Hence E(X/Y) is 10.2. This is greater than 1, so the if the implication were to hold, E(X) should be greater than E(Y). But E(X) is (20+60)/2 = 40, while E(Y) is (2+100)/2 = 51. Hence, the implication E(X/Y) &amp;gt; 1 =&amp;gt; E(X) &amp;gt; E(Y) does not hold in general.&lt;br&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;So there you have it. The proposed argument relies on an implication which seems true intuitively, but turns out to be false under scrutiny. Mystery solved?... Almost.&lt;/p&gt;
&lt;h5&gt;Imprecise language's contribution to the puzzle&lt;br&gt;&lt;/h5&gt;
&lt;p&gt;The argument concerning the original, indistinguishable envelopes, is phrased like this: &lt;em&gt;&quot;(1) I denote by A the amount in my selected envelope. (2) The other envelope may contain either 2A or A/2, with a 50% probability each. (3) So the expected value of the money in the other envelope is 1.25A. (4) Hence, the other envelope is expected to have more dollars.&quot;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Depending on how pedantic you are, you might say that the statement made in the third sentence is strictly false, or that it is too ambiguous to be strictly false, or that at least one interpretation is true. The expected value 1.25A is &lt;em&gt;&quot;of the amount of money contained in the other envelope expressed in terms of the amount of money in this envelope&quot;&lt;/em&gt;. It is &lt;strong&gt;&lt;em&gt;not&lt;/em&gt; &lt;/strong&gt;&lt;em&gt;&quot;of the amount of money in the other envelope expressed in dollars&quot;&lt;/em&gt;. Hence the last sentence does not follow, and if the statements were made in full and with complete accuracy, the fact that it does not follow is a little bit more obvious.&lt;/p&gt;
&lt;p&gt;In closing, I would say this puzzle is hard because &quot;in terms of this envelope&quot; and &quot;in terms of dollars&quot; are typically equivalent enough in everyday life, but when it comes to expected values, this equivalence breaks down rather counter-intuitively.&lt;/p&gt;&lt;/div&gt;
&lt;a href="http://lesswrong.com/lw/dy9/solving_the_two_envelopes_problem/#comments"&gt;31 comments&lt;/a&gt;
</description>
</item>
<item>
<title>Fundamentals of kicking anthropic butt</title>
<link>http://lesswrong.com/lw/85i/fundamentals_of_kicking_anthropic_butt/</link>
<guid isPermaLink="true">http://lesswrong.com/lw/85i/fundamentals_of_kicking_anthropic_butt/</guid>
<pubDate>Mon, 26 Mar 2012 17:43:16 +1100</pubDate>
<description>
Submitted by &lt;a href="http://lesswrong.com/user/Manfred"&gt;Manfred&lt;/a&gt;
&amp;bull;
18 votes
&amp;bull;
&lt;a href="http://lesswrong.com/lw/85i/fundamentals_of_kicking_anthropic_butt/#comments"&gt;60 comments&lt;/a&gt;
&lt;div&gt;&lt;p&gt;&lt;br&gt;&lt;img src=&quot;http://dreager1.files.wordpress.com/2011/06/3102269717_1e707314af.jpg&quot; style=&quot;float: right;&quot; height=&quot;400&quot; alt=&quot;Galactus&quot; width=&quot;299&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;An anthropic problem is one where the very fact of your existence tells you something. &quot;I woke up this morning, therefore the earth did not get eaten by Galactus while I slumbered.&quot; Applying your existence to certainties like that is simple - if an event would have stopped you from existing, your existence tells you that that it hasn't happened. If something would only kill you 99% of the time, though, you have to use probability instead of deductive logic.&amp;#xA0;Usually, it's pretty clear what to do. You simply apply&amp;#xA0;&lt;a href=&quot;http://en.wikipedia.org/wiki/Bayes'_theorem#Introductory_example&quot;&gt;Bayes' rule&lt;/a&gt;: the probability of the world getting eaten by Galactus last night is equal to the prior probability of Galactus-consumption, times the probability of me waking up given that the world got eaten by Galactus, divided by the probability that I wake up at all.&amp;#xA0;More exotic situations also show up under the umbrella of &quot;anthropics,&quot; such as getting duplicated or forgetting which person you are. Even if you've been duplicated, you can still assign probabilities. If there are a hundred copies of you in a hundred-room hotel and you don't know which one you are, don't bet too much that you're in room number 68.&lt;/p&gt;
&lt;p&gt;But this last sort of problem is harder, since it's not just a straightforward application of Bayes' rule. You have to determine the probability just from the information in the problem. Thinking in terms of information and symmetries is a useful problem-solving tool for getting probabilities in anthropic problems, which are simple enough to use it and confusing enough to need it. So first we'll cover what I mean by thinking in terms of information, and then we'll use this to solve a confusing-type anthropic problem.&lt;a id=&quot;more&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Parable of the coin&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Eliezer has already written about what probability is in &lt;a href=&quot;/lw/oj/probability_is_in_the_mind/&quot;&gt;Probability is in the Mind&lt;/a&gt;. I will revisit it anyhow, using a similar example from &lt;em&gt;Probability Theory: The Logic of Science&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;It is a truth universally acknowledged that when someone tosses a fair coin without cheating, there's a 0.5 probability of heads and a 0.5 probability of tails. You draw the coin forth, flip it, and slap it down. What is the probability that when you take your hand away, you see heads?&lt;/p&gt;
&lt;p&gt;Well, you performed a fair coin flip, so the chance of heads is 0.5. What's the problem? Well, imagine the coin's perspective. When you say &quot;heads, 0.5,&quot; that doesn't mean the coin has half of heads up and half of tails up: the coin is already how it's going to be, sitting pressed under your hand. And it's already how it is with probability 1, not 0.5. If the coin is &lt;em&gt;already&lt;/em&gt; tails, how can you be correct when you say that it's heads with probability 0.5? If something is already determined, how can it still have the property of randomness?&lt;/p&gt;
&lt;p&gt;The key idea is that the randomness isn't in the coin, it's in your map of the coin. The coin can be tails all it dang likes, but if you don't know that, you shouldn't be expected to take it into account. The probability isn't a physical property of the coin, nor is it a property of flipping the coin - after all, your probability was still 0.5 when the truth was sitting right there under your hand. The probability is determined by the&lt;em&gt;&amp;#xA0;information&lt;/em&gt;&amp;#xA0;you have about flipping the coin.&lt;/p&gt;
&lt;p&gt;Assigning probabilities to things tells you about the map, not the territory. It's like a machine that eats information and spits out probabilities, with those probabilities uniquely determined by the information that went in.&amp;#xA0;Thinking about problems in terms of information, then, is about treating probabilities as the best possible answers for people with incomplete information. Probability isn't in the coin, so don't even bother thinking about the coin too much - think about the person and what they know.&lt;/p&gt;
&lt;p&gt;When trying to get probabilities from information, you're going to end up using&amp;#xA0;symmetry&amp;#xA0;a lot. Because information uniquely specifies probability, if you have identical information about two things, then you should assign them equal probability. For example, if someone switched the labels &quot;heads&quot; and &quot;tails&quot; in a fair coin flip, you couldn't tell that it had been done - you never had any different information about heads as opposed to tails. This symmetry means you should give heads and tails equal probability. Because heads and tails are mutually exclusive (they don't overlap) and exhaustive (there can't be anything else), the probabilities have to add to 1 (which is all the probability there is), so you give each of them probability 0.5.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Brief note on useless information&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Real-world problems, even when they have symmetry, often start you off with a lot more information than &quot;it could be heads or tails.&quot; If we're flipping a real-world coin there's the temperature to consider, and the humidity, and the time of day, and the flipper's gender, and that sort of thing. If you're an ordinary human, you are allowed to call this stuff extraneous junk. Sometimes, this extra information could theoretically be&amp;#xA0;&lt;a href=&quot;/lw/o2/mutual_information_and_density_in_thingspace/&quot;&gt;correlated with the outcome&lt;/a&gt;&amp;#xA0;- maybe the humidity really matters somehow, or the time of day. But if you don't know &lt;em&gt;how&lt;/em&gt;&amp;#xA0;it's correlated, you have at least a&amp;#xA0;&lt;em&gt;de facto&lt;/em&gt;&amp;#xA0;symmetry. Throwing away useless information is a key step in doing anything useful.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sleeping Beauty&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;So thinking with information means assigning probabilities based on what people know, rather than treating probabilities as properties of objects. To actually apply this, we'll use as our example the &lt;a href=&quot;http://wiki.lesswrong.com/wiki/Sleeping_Beauty_problem&quot;&gt;sleeping beauty problem&lt;/a&gt;:&lt;/p&gt;
&lt;dl style=&quot;margin-top: 0.2em; margin-bottom: 0.5em; font-family: Arial, Helvetica, sans-serif; font-size: 13px; line-height: 19px;&quot;&gt;&lt;dd style=&quot;line-height: 1.5em; margin-left: 2em; margin-bottom: 0.1em;&quot;&gt;Suppose Sleeping Beauty volunteers to undergo the following experiment, which is described to her before it begins. On Sunday she is given a drug that sends her to sleep, and a coin is tossed. If the coin lands heads, Beauty is awakened and interviewed on Monday, and then the experiment ends. If the coin comes up tails, she is awakened and interviewed on Monday, given a second dose of the sleeping drug that makes her forget the events of Monday only, and awakened and interviewed again on Tuesday. The experiment then ends on Tuesday, without flipping the coin again.&lt;/dd&gt;&lt;/dl&gt;&lt;dl style=&quot;margin-top: 0.2em; margin-bottom: 0.5em; font-family: Arial, Helvetica, sans-serif; font-size: 13px; line-height: 19px;&quot;&gt;&lt;dd style=&quot;line-height: 1.5em; margin-left: 2em; margin-bottom: 0.1em;&quot;&gt;Beauty wakes up in the experiment and is asked, &quot;With what subjective probability do you believe that the coin landed tails?&quot;&lt;/dd&gt; &lt;/dl&gt;
&lt;p&gt;&lt;span style=&quot;font-family: Verdana, Arial, Helvetica, sans-serif; line-height: normal; font-size: small;&quot;&gt;If the coin lands heads, Sleeping Beauty is only asked for her guess once, while if the coin lands tails she is asked for her guess twice, but her memory is erased in between so she has the same memories each time.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;When trying to answer for Sleeping Beauty, many people reason as follows: It is a truth universally acknowledged that when someone tosses a fair coin without cheating, there's a 0.5 probability of heads and a 0.5 probability of tails. So since the probability of tails is 0.5, Beauty should say &quot;0.5,&quot; Q.E.D. &amp;#xA0;Readers may notice that this argument is all about the coin, not about what Beauty knows. This violation of good practice may help explain why it is dead wrong.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Thinking with information: some warmups&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To collect the ingredients of the solution, I'm going to first go through some similar-looking problems.&lt;/p&gt;
&lt;p&gt;In the Sleeping Beauty problem, she has to choose between three options - let's call them&amp;#xA0;{H, Monday}, {T, Monday}, and {T, Tuesday}. So let's start with a very simple problem involving three options: the three-sided die. Just like for the fair coin, you know that the sides of the die are mutually exclusive and exhaustive, and you don't know anything else that&amp;#xA0;would&amp;#xA0;be correlated with one side showing up more than another. Sure, the sides have different labels, but the labels are extraneous junk as far as probability is concerned. Mutually exclusive and exhaustive means the probabilities have to add up to one, and the symmetry of your information about the sides means you should give them the same probabilities, so they each get probability 1/3.&lt;/p&gt;
&lt;p&gt;Next, what should Sleeping Beauty believe before the experiment begins? Beforehand, her information looks like this:&amp;#xA0;she signed up for this experiment where you get woken up on Monday if the coin lands heads and on Monday and Tuesday if it lands tails.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;http://images.lesswrong.com/t3_85i_0.png&quot; style=&quot;float: right;&quot; height=&quot;293&quot; alt=&quot;Diagram of the Sleeping Beauty problem before it starts&quot; width=&quot;215&quot;&gt;This way of stating her information is good enough most of the time, but what's going on is clearer if we're a little more formal. There are three exhaustive (but not mutually exclusive) options: {H, Monday}, {T, Monday}, and {T, Tuesday}. She knows that anything with heads is mutually exclusive with anything with tails, and that {T, Tuesday} happens if and only if&amp;#xA0;{T, Monday}&amp;#xA0;happened.&lt;/p&gt;
&lt;p&gt;One good way to think of this last piece of information is as a special &quot;AND&quot; structure containing {T, Monday} and {T, Tuesday}, like in the picture to the right. What it means is that since the things that are &quot;AND&quot; happen together, the other probabilities won't change if we merge them into a single option, which I shall call {T, Both}. Now we have two options, {H, Monday} and {T, Both}, which are both exhaustive and mutually exclusive. This looks an awful lot like the fair coin, with probabilities of 0.5.&lt;/p&gt;
&lt;p&gt;But can we leave it at that? Why shouldn't two days be worth twice as much probability as one day, for instance? Well, it turns out we &lt;em&gt;can&lt;/em&gt;&amp;#xA0;leave at that, because we have now run out of information from the original problem. We used that there were three options, we used that they were exhaustive, we used that two of them always happened together, and we used that the remaining two were mutually exclusive. That's all, and so that's where we should leave it - any more and we'd be making up information not in the problem, which is bad.&lt;/p&gt;
&lt;p&gt;So to decompress, before the experiment begins Beauty assigns probability 0.5 to the coin landing heads and being woken up on Monday, probability 0.5 to the coin landing tails and being woken up on Monday, and probability 0.5 to the coin landing tails and being woken up on Tuesday. This adds up to 1.5, but that's okay since these things aren't all mutually exclusive.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;http://images.lesswrong.com/t3_85i_1.png&quot; style=&quot;float: right;&quot; height=&quot;227&quot; alt=&quot;Diagram of the two coins problem&quot; width=&quot;274&quot;&gt;Okay, now for one last warmup. Suppose you have two coins. You flip the first one, and if it lands heads, you place the second coin on the table heads up. If the first coin lands tails, though, you flip the second coin.&lt;/p&gt;
&lt;p&gt;This new problem looks sort of familiar. You have three options, {H, H}, {T, H} and {T, T}, and these options are mutually exclusive and exhaustive. So does that mean it's the same set of information as the three-sided die? Not quite. Similar to the &quot;AND&quot; previously, my drawing for this problem has an &quot;OR&quot; between {T, H} and {T,T}, representing additional information.&lt;/p&gt;
&lt;p&gt;I'd like to add a note here about my jargon. &quot;AND&quot; makes total sense. One thing happens &lt;em&gt;and&lt;/em&gt; another thing happens. &quot;OR,&quot; however, doesn't make so much sense, because things that are mutually exclusive are already &quot;or&quot; by default - one thing happens &lt;em&gt;or&lt;/em&gt; another thing happens. What it really means is that {H, H} has a symmetry with the &lt;em&gt;sum &lt;/em&gt;of {T, H} and {T, T} (that is, {T, H} &quot;OR&quot; {T, T}). The &quot;OR&quot; can also be thought of as information about {H, H} instead - it contains what could have been both the {H, H} and {H, T} events, so there's a four-way symmetry in the problem, it's just been relabeled.&lt;/p&gt;
&lt;p&gt;When we had the &quot;AND&quot; structure, we merged the two options together to get {tails, both}. For &quot;OR,&quot; we can do a slightly different operation and replace {T, H} &quot;OR&quot; {T, T} by their sum, {T, either}. Now the options become {H, H} and {T, either}, which are mutually exclusive and exhaustive, which gets us back to the fair coin. Then, because {T, H} and {T, T} have a symmetry between them, you split the probability from {T, either} evenly to get&amp;#xA0;probabilities&amp;#xA0;of 0.5, 0.25, and 0.25.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Okay, for real now&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Okay, so now what do things look like once the experiment has started? In English, now she knows that&amp;#xA0;she signed up for this experiment where you get woken up on Monday if the coin lands heads and on Monday and Tuesday if it lands tails, went to sleep, and now she's been woken up.&lt;/p&gt;
&lt;p&gt;This might not seem that different from before, but the &quot;anthropic information&quot; that Beauty is currently one of the people in the experiment changes the formal picture a lot. Before, the three options were not mutually exclusive, because she was thinking about the future. But now&amp;#xA0;{H, Monday}, {T, Monday}, and {T, Tuesday} are both exhaustive and mutually exclusive, because only one can be the case in the present. From the coin flip, she still knows that anything with heads is mutually exclusive with anything with tails. But once two things are mutually exclusive you can't make them any &lt;em&gt;more&lt;/em&gt;&amp;#xA0;mutually exclusive.&lt;/p&gt;
&lt;p&gt;But the &quot;AND&quot; information! What happens to that? Well, that was based on things always happening together, and we just got information that those things are mutually exclusive, so there's no more &quot;AND.&quot; It's possible to slip up here and reason that since there used to be some structure there, and now they're mutually exclusive, it's one or the other, therefore there must be &quot;OR&quot; information. At least the confusion in my terminology reflects an easy confusion to have, but this &quot;OR&quot; relationship isn't the same as mutual exclusivity. It's a specific piece of information that wasn't in the problem before the experiment, and wasn't part of the anthropic information (that was just mutual exclusivity). So Monday and Tuesday are &quot;or&quot; (mutually exclusive), but not &quot;OR&quot; (can be added up to use another symmetry).&lt;/p&gt;
&lt;p&gt;And so this anthropic requirement of mutual exclusivity turns out to make redundant or render null a big chunk of the previous information, which is strange. You end up left with three mutually exclusive, exhaustive options, with no particular asymmetry. This is the three-sided die information, and so each of&amp;#xA0;{H, Monday}, {T, Monday}, and {T, Tuesday} should get probability 1/3. So when asked for P(tails), Beauty should answer 2/3.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&quot;SSA&quot; and &quot;SIA&quot;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When assigning prior probabilities in anthropic problems, there are two main &quot;easy&quot; ways to assign probabilities, and these methods go by the acronyms &quot;SSA&quot; and &quot;SIA.&quot; &quot;SSA&quot; is stated like this&lt;sup&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Self-Sampling_Assumption&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px; &quot;&gt;&lt;span style=&quot;font-family: sans-serif; font-size: 13px; line-height: 19px; &quot;&gt;All other things equal, an observer should reason as if they are randomly selected from the set of all&amp;#xA0;&lt;/span&gt;&lt;em style=&quot;font-family: sans-serif; font-size: 13px; line-height: 19px; &quot;&gt;actually existent&lt;/em&gt;&lt;span style=&quot;font-family: sans-serif; font-size: 13px; line-height: 19px; &quot;&gt;&amp;#xA0;observers (past, present and future) in their reference class.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;font-family: sans-serif; font-size: 13px; line-height: 19px; &quot;&gt;&lt;span style=&quot;font-family: Verdana, Arial, Helvetica, sans-serif; line-height: normal; font-size: small; &quot;&gt;For example, if you wanted the prior probability that you lived in Sweden, you might say ask &quot;what proportion of human beings have lived in Sweden?&quot;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;font-family: sans-serif; font-size: 13px; line-height: 19px; &quot;&gt;&lt;span style=&quot;font-family: Verdana, Arial, Helvetica, sans-serif; line-height: normal; font-size: small; &quot;&gt;On the other hand, &quot;SIA&quot; looks like this&lt;sup&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Self-Indication_Assumption&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;:&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px; &quot;&gt;&lt;span style=&quot;font-family: sans-serif; font-size: 13px; line-height: 19px; &quot;&gt;All other things equal, an observer should reason as if they are randomly selected from the set of all&amp;#xA0;&lt;/span&gt;&lt;em style=&quot;font-family: sans-serif; font-size: 13px; line-height: 19px; &quot;&gt;possible&lt;/em&gt;&lt;span style=&quot;font-family: sans-serif; font-size: 13px; line-height: 19px; &quot;&gt;&amp;#xA0;observers.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Now the question becomes &quot;what proportion of possible observers live in Sweden?&quot; and suddenly it seems awfully improbable that anyone could live in Sweden.&lt;/p&gt;
&lt;p&gt;The astute reader will notice that these two &quot;assumptions&quot; correspond to two different sets of starting information. If you want a quick exercise, figure out what those two sets of information are now. I'll wait for&amp;#xA0;you&amp;#xA0;in the next paragraph.&lt;/p&gt;
&lt;p&gt;Hi again. The information assumed for SSA is pretty straightforward. You are supposed to reason as if you know that you're an actually existent observer, in some &quot;reference class.&quot; So an example set of information would be &quot;I exist/existed/will exist and am a human.&quot; Compared to that, SIA seems to barely assume any information at all - all you get to start with is &quot;I am a possible observer.&quot; Because &quot;existent observers in a reference class&quot; are a subset of possible observers, you can transform SIA into SSA by adding on more information, e.g. &quot;I exist and am a human.&quot; And then if you want to represent a more complicated problem, you have to add extra information on top of that, like &quot;I live in 2012&quot; or &quot;I have two X chromosomes.&quot;&lt;/p&gt;
&lt;p&gt;Trouble only sneaks in if you start to see these acronyms as mysterious probability generators rather than sets of starting information to build on. So don't do that.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Closing remarks&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When faced with straightforward problems, you usually don't need to use this knowledge of where probability comes from. It's just rigorous and interesting, like knowing how to do integration as a &lt;a href=&quot;http://www.vias.org/calculus/img/04_integration-10.gif&quot;&gt;Riemann sum&lt;/a&gt;. But whenever you run into foundational or even particularly confusing problems, it's good to remember that probability is about making the best use you can of incomplete information. If not, you run the risk of a few silly failure modes, or even (&lt;em&gt;gasp&lt;/em&gt;) frequentism.&lt;/p&gt;
&lt;p&gt;I recently read an academic paper&lt;a href=&quot;http://arxiv.org/abs/0905.0624v1&quot;&gt;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; that used the idea that in a multiverse, there will be some universe where a thrown coin comes up heads every time, and so the people in that universe will have very strange ideas about how coins work. &lt;em&gt;Therefore&lt;/em&gt;, this actual academic paper argued, since reasoning with probability can lead people to be wrong, it cannot be applied to anything like a multiverse.&lt;/p&gt;
&lt;p&gt;My response is: what have you got that works better? In this post we worked through assigning probabilities by using all of our information. If you deviate from that, you're either throwing information away or making it up. Incomplete information lets you down sometimes, that's why it's called incomplete. But that doesn't license you to throw away information or make it up, out of some sort of dissatisfaction with reality. The truth is out there. But the probabilities are in here.&lt;/p&gt;&lt;/div&gt;
&lt;a href="http://lesswrong.com/lw/85i/fundamentals_of_kicking_anthropic_butt/#comments"&gt;60 comments&lt;/a&gt;
</description>
</item>
<item>
<title>Fallacies as weak Bayesian evidence</title>
<link>http://lesswrong.com/lw/aq2/fallacies_as_weak_bayesian_evidence/</link>
<guid isPermaLink="true">http://lesswrong.com/lw/aq2/fallacies_as_weak_bayesian_evidence/</guid>
<pubDate>Sun, 18 Mar 2012 14:53:34 +1100</pubDate>
<description>
Submitted by &lt;a href="http://lesswrong.com/user/Kaj_Sotala"&gt;Kaj_Sotala&lt;/a&gt;
&amp;bull;
55 votes
&amp;bull;
&lt;a href="http://lesswrong.com/lw/aq2/fallacies_as_weak_bayesian_evidence/#comments"&gt;41 comments&lt;/a&gt;
&lt;div&gt;&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; &lt;em&gt;Exactly what is fallacious about a claim like &amp;#x201D;ghosts exist because no one has proved that they do not&amp;#x201D;? And why does a claim with the same logical structure, such as &amp;#x201D;this drug is safe because we have no evidence that it is not&amp;#x201D;, seem more plausible? Looking at various fallacies &amp;#x2013; the argument from ignorance, circular arguments, and the slippery slope argument - we find that they can be analyzed in Bayesian terms, and that people are generally more convinced by arguments which provide greater Bayesian evidence. Arguments which provide only weak evidence, though often evidence nonetheless, are considered fallacies.&lt;br&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;As a Nefarious Scientist, Dr. Zany is often teleconferencing with other Nefarious Scientists. Negotiations about things such as &amp;#x201D;when we have taken over the world, who's the lucky bastard who gets to rule over Antarctica&amp;#x201D; will often turn tense and stressful. Dr. Zany knows that stress makes it harder to evaluate arguments logically. To make things easier, he would like to build a software tool that would monitor the conversations and automatically flag any fallacious claims as such. That way, if he's too stressed out to realize that an argument offered by one of his colleagues is actually wrong, the software will work as backup to warn him.&lt;/p&gt;
&lt;p&gt;Unfortunately, it's not easy to define what counts as a fallacy. At first, Dr. Zany tried looking at the logical form of various claims. An early example that he considered was &amp;#x201D;ghosts exist because no one has proved that they do not&amp;#x201D;, which felt clearly wrong, an instance of the argument from ignorance. But when he programmed his software to warn him about sentences like that, it ended up flagging the claim &amp;#x201D;this drug is safe, because we have no evidence that it is not&amp;#x201D;. Hmm. That claim felt somewhat weak, but it didn't feel obviously wrong the way that the ghost argument did. Yet they shared the same structure. What was the difference?&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;text-decoration: underline;&quot;&gt;&lt;strong&gt;The argument from ignorance&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Related posts: &lt;/em&gt;&lt;a href=&quot;/lw/ih/absence_of_evidence_is_evidence_of_absence/&quot;&gt;Absence of Evidence is Evidence of Absence&lt;/a&gt;, &lt;a href=&quot;/lw/27e/but_somebody_would_have_noticed/&quot;&gt;But Somebody Would Have Noticed!&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;One kind of argument from ignorance is based on &lt;em&gt;negative evidence. &lt;/em&gt;It assumes that if the hypothesis of interest were true, then experiments made to test it would show positive results. If a drug were toxic, tests of toxicity of reveal this. Whether or not this argument is valid depends on whether the tests &lt;em&gt;would&lt;/em&gt; indeed show positive results, and with what probability.&lt;/p&gt;
&lt;p&gt;With some thought and help from AS-01, Dr. Zany identified three intuitions about this kind of reasoning.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;1. Prior beliefs influence whether or not the argument is accepted.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;A) I've often drunk alcohol, and never gotten drunk. Therefore alcohol doesn't cause intoxication.&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;B) I've often taken Acme Flu Medicine, and never gotten any side effects. Therefore Acme Flu Medicine doesn't cause any side effects.&lt;/p&gt;
&lt;p&gt;Both of these are examples of the argument from ignorance, and both seem fallacious. But B seems much more compelling than A, since we &lt;em&gt;know&lt;/em&gt; that alcohol causes intoxication, while we also know that not all kinds of medicine have side effects.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;2. The more evidence found that is compatible with the conclusions of these arguments, the more acceptable they seem to be.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;C) Acme Flu Medicine is not toxic because no toxic effects were observed in 50 tests.&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;D) Acme Flu Medicine is not toxic because no toxic effects were observed in 1 test.&lt;/p&gt;
&lt;p&gt;C seems more compelling than D.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;3. Negative arguments are acceptable, but they are generally less acceptable than positive arguments.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;E) Acme Flu Medicine is toxic because a toxic effect was observed (positive argument)&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;F) Acme Flu Medicine is not toxic because no toxic effect was observed (negative argument, the argument from ignorance)&lt;/p&gt;
&lt;p&gt;Argument E seems more convincing than argument F, but F is somewhat convincing as well.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&quot;Aha!&quot; Dr. Zany exclaims. &quot;These three intuitions share a common origin! They bear the signatures of Bayonet reasoning!&quot;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&quot;&lt;a href=&quot;/lw/1to/what_is_bayesianism/&quot;&gt;Bayesian&lt;/a&gt; reasoning&quot;, AS-01 politely corrects.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&quot;Yes, Bayesian! But, hmm. Exactly &lt;/em&gt;&lt;em&gt;how are they Bayesian?&quot;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a id=&quot;more&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;&lt;em&gt;Note: &lt;/em&gt;To keep this post as accessible as possible, I attempt to explain the underlying math without actually using any math. If you would rather see the math, please see the paper referenced at the end of the post.&lt;/p&gt;
&lt;p&gt;As a brief reminder, the essence of &lt;a href=&quot;http://wiki.lesswrong.com/wiki/Bayes%27_theorem&quot;&gt;Bayes' theorem&lt;/a&gt; is that we have different theories about the world, and the extent to which we believe in these theories varies. Each theory also has implications about what you expect to observe in the world (or at least it &lt;a href=&quot;http://wiki.lesswrong.com/wiki/Making_beliefs_pay_rent&quot;&gt;&lt;em&gt;should&lt;/em&gt; have such implications&lt;/a&gt;). The extent to which an observation makes us update our beliefs depends on how likely our theories say the observation should be. Dr. Zany has a strong belief that his plans will basically always succeed, and this theory says that his plans are very unlikely to fail. Therefore, when they do fail, he should revise his belief in the &quot;I will always succeed&quot; theory down. (So far he hasn't made that update, though.) If this isn't completely intuitive to you, I recommend &lt;a href=&quot;/lw/2b0/bayes_theorem_illustrated_my_way/&quot;&gt;komponisto's awesome visualization&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Now let's look at each of the above intuitions in terms of Bayes' theorem.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;1. Prior beliefs influence whether or not the argument is accepted. &lt;/em&gt;This is pretty straightforward -the expression &quot;prior beliefs&quot; is even there in the description of the intuition. Suppose that we hear the argument, &quot;I've often drunk alcohol, and never gotten drunk. Therefore alcohol doesn't cause intoxication&quot;. The fact that this person has never gotten drunk from alcohol (or at least claims that he hasn't) &lt;em&gt;is&lt;/em&gt; &lt;a href=&quot;/lw/jl/what_is_evidence/&quot;&gt;evidence&lt;/a&gt; for alcohol not causing any intoxication, but we still have a very strong prior belief for alcohol causing intoxication. Updating on this evidence, we find that our beliefs in both the theory &quot;this person is mistaken or lying&quot; and the theory &quot;alcohol doesn't cause intoxication&quot; have become stronger. Due to its higher prior probability, &quot;this person is mistaken or lying&quot; seems more plausible of the two, so we do not consider this a persuasive argument for alcohol not being intoxicating.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;2. The more evidence found that is compatible with the conclusions of these arguments, the more acceptable they seem to be.&lt;/em&gt;&lt;em&gt;&lt;strong&gt; &lt;/strong&gt;&lt;/em&gt;This too is a relatively straightforward consequence of Bayes' theorem. In terms of belief updating, we might encounter 50 pieces of evidence, one at a time, and make 50 small updates. Or we might encounter all of the 50 pieces of evidence at once, and perform one large update. The end result should be the same. More evidence leads to larger updates.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;3. Negative arguments are acceptable, but they are generally less acceptable than positive arguments.&lt;/em&gt;&lt;strong&gt;&lt;em&gt; &lt;/em&gt;&lt;/strong&gt;This one needs a little explaining, and here we need the concepts of &lt;a href=&quot;https://en.wikipedia.org/wiki/Sensitivity_and_specificity&quot;&gt;sensitivity and specifity&lt;/a&gt;. A test for something (say, a disease) is &lt;em&gt;sensitive&lt;/em&gt; if it &lt;em&gt;always&lt;/em&gt; gives a positive result when the disease is present, and &lt;em&gt;specific&lt;/em&gt; if it &lt;em&gt;only&lt;/em&gt; gives a positive result when the disease is present. There's a trade-off between these two. For instance, an airport metal detector is designed to alert its operators if a person carries dangerous metal items. It is &lt;em&gt;sensitive&lt;/em&gt;, because nearly any metal item will trigger an alarm - but it is not very &lt;em&gt;specific&lt;/em&gt;, because even non-dangerous items will trigger an alarm.&lt;/p&gt;
&lt;p&gt;A test which is both extremly sensitive and extremly non-specific is not very useful, since it will give more false alarms than true ones. An easy way of creating an extremely sensitive &quot;test for disease&quot; is to simply &lt;em&gt;always&lt;/em&gt; say that the patient has the disease. This test has 100% sensitivity (it always gives a positive result, so it always gives a positive result when the disease is present, as well), but its specificity is very low - equal to the prevalence rate of the disease. It provides no information, and isn't therefore a test at all.&lt;/p&gt;
&lt;p&gt;How is this related to our intuition about negative and positive arguments? In short, our environment is such that like the airport metal detector, negative evidence often has high sensitivity but low specificity. We intuitively expect that a test for toxicity might not always reveal a drug to be toxic, but if it does, then the drug really &lt;em&gt;is&lt;/em&gt; toxic. A lack of a &quot;toxic&quot; result is what we would expect if the drug weren't toxic, but it's also what we would expect in a lot of cases where the drug &lt;em&gt;was&lt;/em&gt; toxic. Thus, &lt;a href=&quot;/lw/ih/absence_of_evidence_is_evidence_of_absence/&quot;&gt;negative evidence &lt;em&gt;is&lt;/em&gt; evidence&lt;/a&gt;, but it's usually much weaker than positive evidence.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&quot;So, umm, okay&quot;, Dr. Zany says, after AS-01 has reminded him of the way Bayes' theorem works, and helped him figure out how his intuitions about the fallacies have Bayes-structure. &quot;But let's not lose track of what we were doing, which is to say, building a fallacy-detector. How can we use this to say whether a given claim is fallacious?&quot;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&quot;What this suggests is that we judge a claim to be a fallacy if it's only weak Bayesian evidence&quot;, AS-01 replies. &quot;A claim like 'an unreliable test of toxicity didn't reveal this drug to be toxic, so it must be safe' is such weak evidence that we consider it fallacious. Also, if we have a very strong prior belief against something, and a claim doesn't shift this prior enough, then we might call it a 'fallacy' to believe in the thing on the basis of that claim. That was the case with the 'I've had alcohol many times and never gotten drunk, so alcohol must not be intoxicating' claim.&quot;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&quot;But that's not what I was after at all! In that case I can't program a simple fallacy-detector: I'd have to implement a full-blown artificial intelligence that could understand the conversation, analyze the prior probabilities of various claims, and judge the weight of evidence. And even if I did that, it wouldn't help me figure out what claims were fallacies, because all of my AIs only want to eradicate the color blue from the universe! Hmm. But maybe the appeal from ignorance was a special case, and other fallacies are more accomodating. How about circular claims? Those must surely be fallacious?&quot;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style=&quot;text-decoration: underline;&quot;&gt;Circularity&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;A. God exists because the Bible says so, and the Bible is the word of God.&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;B. Electrons exist because we can see 3-cm tracks in a cloud chamber, and 3-cm tracks in cloud chambers are signatures of electrons.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&quot;Okay, we have two circular claims here&quot;, AS-01 notes. &quot;Their logical structure seems to be the same, but we judge one of them to be a fallacy, while the other seems to be okay.&quot;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&quot;I have a bad feeling about this&quot;, Dr. Zany says.&lt;br&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The argument for the fallaciousness of the above two claims is that they presume the conclusion in the premises. That is, it is presumed that the Bible is the word of God, but that is only possible if God actually exists. Likewise, if electrons don't exist, then whatever we see in the cloud chamber isn't the signature signs of electrons. Thus, in order to believe the conclusion, we need to already believe it as an implicit premise.&lt;/p&gt;
&lt;p&gt;But from a Bayesian perspective, beliefs aren't binary propositions: we can &lt;em&gt;tentatively &lt;/em&gt;believe in a hypothesis, such as the existence of God or electrons. In addition to this tentative hypothesis, we have sense data about the existence of the Bible and the 3-cm tracks. This data we take as certain. We also have a second tentative belief, the ambiguous interpretation of this sense data as the word of God or the signature of electrons. The sense data is ambiguous in the sense that it might or might not be the word of God. So we have three components in our inference: the evidence (Bible, 3-cm tracks), the ambiguous interpretation (the Bible is the word of God, the 3-cm tracks are signatures of electrons), and the hypothesis (God exists, electrons exist).&lt;/p&gt;
&lt;p&gt;We can conjecture a causal connection between these three components. Let's suppose that God exists (the hypothesis). This then causes the Bible as his word (ambiguous interpretation), which in turn gives rise to the actual document in front of us (sense data). Likewise, if electrons exist (hypothesis), then this can give rise to the predicted signature effects (ambiguous interpretation), which become manifest as what we actually see in the cloud chamber (sense data).&lt;/p&gt;
&lt;p&gt;The &quot;circular&quot; claim reverses the direction of the inference. We have sense data, which we would expect to see if the ambiguous interpretation was correct, and we would expect the interpretation to be correct if the hypothesis were true. Therefore it's more likely that the hypothesis is true. Is this allowed? Yes! Take for example the inference &quot;if there are dark clouds in the sky, then it will rain, in which case the grass will be wet&quot;. The reverse inference, &quot;the grass is wet, therefore it has rained, therefore there have been dark clouds in the sky&quot; is valid. However, the inference &quot;the grass is wet, therefore the sprinkler has been on, thefore there is a sprinkler near this grass&quot; may &lt;em&gt;also&lt;/em&gt; be a valid inference. The grass being wet is evidence for &lt;em&gt;both&lt;/em&gt; the presence of dark clouds &lt;em&gt;and&lt;/em&gt; for a sprinkler having been on. Which hypothesis do we judge to be more likely? That &lt;a href=&quot;http://www.cs.helsinki.fi/sites/default/files/root/course-material/Probabilistic%20models/lecture4.pdf&quot;&gt;depends&lt;/a&gt; on our prior beliefs about the hypotheses, as well as the strengths of the causal links (e.g. &quot;if there are dark clouds, how likely is it that it rains?&quot;, and vice versa).&lt;/p&gt;
&lt;p&gt;Thus, the &quot;circular&quot; arguments given above are actually valid Bayesian inferences. But there is a reason that we consider A to be a fallacy, while B sounds valid. Since the intepretation (the Bible is the word of God, 3-cm tracks are signatures of electrons) logically requires the hypothesis, the probability of the interpretation &lt;a href=&quot;/lw/ji/conjunction_fallacy/&quot;&gt;cannot be higher&lt;/a&gt; than the probability of the hypothesis. If we assign the existence of God a very low prior belief, then we must also assign a very low prior belief to the interpretation of the Bible as the word of God. In that case, seeing the Bible will not do much to elevate our belief in the claim that God exists, if there are more likely hypotheses to be found.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&quot;So you're saying that circular reasoning, too, is something that we consider fallacious if our prior belief in the hypothesis is low enough? And recognizing these kinds of fallacies is &lt;a href=&quot;https://en.wikipedia.org/wiki/AI-complete&quot;&gt;AI-complete&lt;/a&gt;, too?&quot;&lt;/em&gt; &lt;em&gt;Dr. Zany asks.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&quot;Yup!&quot;, AS-01 replies cheerfully, glad that for once, Dr. Zany gets it without a need to explain things fifteen times.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&quot;Damn it. But... what about slippery slope arguments? Dr. Cagliostro claims that if we let minor supervillains stake claims for territory, then we would end up letting henchmen stake claims for territory as well, and eventually we'd give the right to people who didn't even participate in our plans! Surely that must be a fallacy?&quot;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;text-decoration: underline;&quot;&gt;&lt;strong&gt;Slippery slope&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Slippery slope arguments are often treated as fallacies, but they might not be. There are cases where the stipulated &quot;slope&quot; &lt;em&gt;is&lt;/em&gt; what would actually (or likely) happen. For instance, take a claim saying &quot;if we allow microbes to be patented, then that will lead to higher life-forms being patented as well&quot;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;There are cases in law, for example, in which a legal precedent has historically facilitated subsequent legal change. Lode (1999, pp. 511&amp;#x2013;512) cites the example originally identified by Kimbrell (1993) whereby there is good reason to believe that the issuing of a patent on a transgenic mouse by the U.S. Patent and Trademark Office in the year 1988 is the result of a slippery slope set in motion with the U.S. Supreme court&amp;#x2019;s decision Diamond v. Chakrabarty. This latter decision allowed a patent for an oil-eating microbe, and the subsequent granting of a patent for the mouse would have been unthinkable without the chain started by it.&amp;#xA0; (Hahn &amp;amp; Oaksford, 2007)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So again, our prior beliefs, here ones about the plausibility of the slope, influence whether or not the argument is accepted. But there is also another component that was missing from the previous fallacies. Because slippery slope arguments are about actions, not just beliefs, the principle of &lt;a href=&quot;https://en.wikipedia.org/wiki/Expected_utility&quot;&gt;expected utility&lt;/a&gt; becomes relevant. A slippery slope argument will be stronger (relative to its alternative) if it invokes a more undesirable potential consequence, if that consequence is more probable, and if the expected utility of the alternatives is smaller.&lt;/p&gt;
&lt;p&gt;For instance, suppose for the sake of argument that both increased heroin consumption and increased reggae music consumption are equally likely consequences of cannabis legalization:&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;A. Legalizing cannabis will lead to an increase in heroin consumption.&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;B. Legalizing cannabis will lead to an increase in listening to reggae music.&lt;/p&gt;
&lt;p&gt;Yet A would feel like a stronger argument against the legalization of cannabis than argument B, since increased heroin consumption feels like it would have lower utility. On the other hand, if the outcome is shared, then the stronger argument seems to be the one where the causal link seems more probable:&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;C. Legalizing Internet access would lead to an increase in the amount of World of Warcraft addicts.&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;D. Legalizing video rental stores would lead to an increase in the amount of World of Warcraft addicts.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&quot;Gah. So a strong slippery slope argument is one where both the utility of the outcome, &lt;strong&gt;and&lt;/strong&gt; the outcome's probability is high? So &lt;/em&gt;&lt;em&gt;the AI would not only need to evaluate probabilities, but expected utilities as well?&quot;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&quot;That's right!&quot;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&quot;Screw it, this isn't going anywhere. And here I thought that this would be a productive day.&quot;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&quot;They can't all be, but we tried our best. Would you like a tuna sandwich as consolation?&quot;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&quot;Yes, please.&quot;&lt;br&gt;&lt;/em&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Because this post is already unreasonably long, the above discussion only covers the &lt;em&gt;theoretical&lt;/em&gt; reasons for thinking about fallacies as weak or strong Bayesian arguments. For math, experimental studies, and two other subtypes of the argument from ignorance (besides negative evidence), see:&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;Hahn, U. &amp;amp; Oaksford, M. (2007) The Rationality of Informal Argumentation: A Bayesian Approach to Reasoning Fallacies. &lt;em&gt;Psychological Review,&lt;/em&gt; vol. 114, no. 3, 704-732. &lt;a href=&quot;http://commonsenseatheism.com/wp-content/uploads/2011/10/Hahn-Oaksford-The-rationality-of-informal-argumentation.pdf&quot;&gt;http://commonsenseatheism.com/wp-content/uploads/2011/10/Hahn-Oaksford-The-rationality-of-informal-argumentation.pdf&lt;/a&gt;&lt;/p&gt;&lt;/div&gt;
&lt;a href="http://lesswrong.com/lw/aq2/fallacies_as_weak_bayesian_evidence/#comments"&gt;41 comments&lt;/a&gt;
</description>
</item>
<item>
<title>The Ellsberg paradox and money pumps</title>
<link>http://lesswrong.com/lw/9m3/the_ellsberg_paradox_and_money_pumps/</link>
<guid isPermaLink="true">http://lesswrong.com/lw/9m3/the_ellsberg_paradox_and_money_pumps/</guid>
<pubDate>Sun, 29 Jan 2012 04:34:32 +1100</pubDate>
<description>
Submitted by &lt;a href="http://lesswrong.com/user/fool"&gt;fool&lt;/a&gt;
&amp;bull;
9 votes
&amp;bull;
&lt;a href="http://lesswrong.com/lw/9m3/the_ellsberg_paradox_and_money_pumps/#comments"&gt;72 comments&lt;/a&gt;
&lt;div&gt;&lt;p&gt;&lt;strong&gt;Followup to&lt;/strong&gt;: &lt;a href=&quot;/lw/9e4/the_savage_theorem_and_the_ellsberg_paradox&quot;&gt;The Savage theorem and the Ellsberg paradox&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In the previous post, I presented a simple version of Savage's theorem, and I introduced the Ellsberg paradox. At the end of the post, I mentioned a strong Bayesian thesis, which can be summarised: &quot;There is always a price to pay for leaving the Bayesian Way.&quot;&lt;sup&gt;1&lt;/sup&gt; But not always, it turns out. I claimed that there was a method that is Ellsberg-paradoxical, therefore non-Bayesian, but can't be money-pumped (or &quot;Dutch booked&quot;). I will present the method in this post.&lt;/p&gt;
&lt;p&gt;I'm afraid this is another long post. There's a short summary of the method at the very end, if you want to skip the &lt;a href=&quot;http://www.urbandictionary.com/define.php?term=jibba%20jabba&quot;&gt;jibba jabba&lt;/a&gt; and get right to the specification. Before trying to money-pump it, I'd suggest reading at least the two highlighted dialogues.&lt;/p&gt;
&lt;h2&gt;Ambiguity aversion&lt;/h2&gt;
&lt;p&gt;To recap the Ellsberg paradox: there's an urn with 30 red balls and 60 other balls that are either green or blue, in unknown proportions. Most people, when asked to choose between betting on red or on green, choose red, but, when asked to choose between betting on red-or-blue or on green-or-blue, choose green-or-blue. For some people this behaviour persists even after due calculation and reflection. This behaviour is non-Bayesian, and is the prototypical example of &lt;a href=&quot;http://en.wikipedia.org/wiki/Ambiguity_aversion&quot;&gt;ambiguity aversion&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There were some major themes that came out in the comments on that post. One theme was that I Fail Technical Writing Forever. I'll try to redeem myself.&lt;/p&gt;
&lt;p&gt;Another theme was that the setup given may be a bit too symmetrical. The Bayesian answer would be indifference, and really, you can break ties however you want. However the paradoxical preferences are typically &lt;em&gt;strict&lt;/em&gt;, rather than just tie-breaking behaviour. (And when it's not strict, we shouldn't call it ambiguity aversion.) One suggestion was to add or remove a couple of red balls. Speaking for myself, I would still make the paradoxical choices.&lt;/p&gt;
&lt;p&gt;A third theme was that ambiguity aversion might be a good heuristic if betting against someone who may know something you don't. Now, no such opponent was specified, and speaking for myself, I'm not inferring one when I make the paradoxical choices. Still, let me admit that it's not contrived to infer a mischievous experimenter from the Ellsberg setup. &lt;a href=&quot;/lw/5te/a_summary_of_savages_foundations_for_probability/5o6o&quot;&gt;One commentator&lt;/a&gt; puts it better than me:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Betting generally includes an adversary who wants you to lose money so they win in. Possibly in psychology experiments [this might not apply] ... But generally, ignoring the possibility of someone wanting to win money off you when they offer you a bet is a bad idea.&lt;/p&gt;
&lt;p&gt;Now betting is supposed to be a metaphor for options with possibly unknown results. In which case sometimes you still need to account for the possibility that the options were made available by an adversary who wants you to choose badly, but less often. And you should also account for the possibility that they were from other people who wanted you to choose well, or that the options were not determined by any intelligent being or process trying to predict your choices, so you don't need to account for an anticorrelation between your choice and the best choice. Except for your own biases.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We can take betting on the Ellsberg urn as a stand-in for various decisions under ambiguous circumstances. Ambiguity aversion can be Bayesian if we assume the right sort of correlation between the options offered and the state of the world, or the right sort of correlation between the choice made and the state of the world. In that case just about anything can Bayesian. But sometimes the opponent will not have extra information, nor extra power. There might not even be any opponent as such. If we assume there are no such correlations, then ambiguity aversion is non-Bayesian.&lt;/p&gt;
&lt;p&gt;The final theme was: &lt;em&gt;so what&lt;/em&gt;? Ambiguity aversion is just another cognitive bias. One commentator specifically complained that I spent too much time talking about various abstractions and not enough time talking about how ambiguity aversion could be money-pumped. I will fix that now: I claim that ambiguity aversion cannot be money-pumped, and the rest of this post is about my claim.&lt;/p&gt;
&lt;p&gt;I'll start with a bit of name-dropping and some &lt;a href=&quot;http://en.wikipedia.org/wiki/Whig_history&quot;&gt;whig history&lt;/a&gt;, to make myself sound more credible than I really am&lt;sup&gt;2&lt;/sup&gt;. In the last twenty years or so many models of ambiguity averse reasoning have been constructed. Choquet expected utility&lt;sup&gt;3&lt;/sup&gt; and maxmin expected utility&lt;sup&gt;4&lt;/sup&gt; were early proposed models of ambiguity aversion. Later multiplier preferences&lt;sup&gt;5&lt;/sup&gt; were the result of applying the ideas of &lt;a href=&quot;http://en.wikipedia.org/wiki/Robust_control&quot;&gt;robust control&lt;/a&gt; to macroeconomic models. This results in ambiguity aversion, though it was not explicitly motivated by the Ellsberg paradox. More recently, variational preferences&lt;sup&gt;6&lt;/sup&gt; generalises both multiplier preferences and maxmin expected utility. What I'm going to present is a finitary case of variational preferences, with some of my own amateur mathematical fiddling for rhetorical purposes.&lt;/p&gt;
&lt;h2&gt;Probability intervals&lt;/h2&gt;
&lt;p&gt;The starting idea is simple enough, and may have already occurred to some LW readers. Instead of using a prior probability for events, can we not use an &lt;a href=&quot;http://en.wikipedia.org/wiki/Imprecise_probability&quot;&gt;interval of probabilities&lt;/a&gt;? What should our betting behaviour be for an event with probability 50%, plus or minus 10%?&lt;/p&gt;
&lt;p&gt;There are some different ways of filling in the details. So to be quite clear, I'm not proposing the following as the One True Probability Theory, and I am not claiming that the following is descriptive of many people's behaviour. What follows is just one way of making ambiguity aversion work, and perhaps the simplest way. This makes sense, given my aim: I should just describe a simple method that leaves the Bayesian Way, but does not pay.&lt;/p&gt;
&lt;p&gt;Now, sometimes disjoint ambiguous events together make an event with known probability. Or even a certainty, as in an event and its negation. If we want probability intervals to be additive (and let's say that we do) then what we really want are &lt;em&gt;oriented&lt;/em&gt; intervals. I'll use +- or -+ (pronounced: plus-or-minus, minus-or-plus) to indicate two opposite orientations. So, if P(X) = 1/2 +- 1/10, then P(not X) = 1/2 -+ 1/10, and these add up to 1 exactly.&lt;/p&gt;
&lt;p&gt;Such oriented intervals are equivalent to ordered pairs of numbers. Sometimes it's more helpful to think of them as oriented intervals, but sometimes it's more helpful to think of them as pairs. So 1/2 +- 1/10 is the pair (3/5,2/5). And 1/2 -+ 1/10 is (2/5,3/5), the same numbers in the opposite order. The sum of these is (1,1), which is 1 exactly.&lt;/p&gt;
&lt;p&gt;You may wonder, if we can use ordered pairs, can we use triples, or longer lists? Yes, this method can be made to work with those too. And we can still think in terms of centre, length, and orientation. The orientation can go off in all sorts of directions, instead of just two. But for my purposes, I'll just stick with two.&lt;/p&gt;
&lt;p&gt;You might also ask, can we set P(X) = 1/2 +- 1/2? No, this method just won't handle it. A restriction of this method is that neither of the pair can be 0 or 1, except when they're both 0 or both 1. The way we will be using these intervals, 1/2 +- 1/2 would be the extreme case of ambiguity aversion. 1/2 +- 1/10 represents a lesser amount of ambiguity aversion, a sort of compromise between worst-case and average-case behaviour.&lt;/p&gt;
&lt;p&gt;To decide among bets (having the same two outcomes), compute their probability intervals. Sometimes, the intervals will not overlap. Then it's unambiguous which is more likely, so it's clear what to pick. In general, whether they overlap or not, pick the one with the largest minimum -- though we will see there are three caveats when they do overlap. If P(X) = 1/2 +- 1/10, we would be indifferent between a bet on X and on not X: the minimum is 2/5 in either case. If P(Y) = 1/2 exactly, then we would strictly prefer a bet on Y to a bet on X.&lt;/p&gt;
&lt;p&gt;Which leads to the &lt;strong&gt;first caveat&lt;/strong&gt;: sometimes, given two options, it's strictly better to randomise. Let's suppose Y represents a fair coin. So P(Y) = 1/2 exactly, as we said. But also, Y is independent of X. P(X and Y) = 1/4 +- 1/20, and so on. This means that P((X and not Y) or (Y and not X)) = 1/2 exactly also. So we're indifferent between a bet on X and a bet on not X, but we strictly prefer the randomised bet.&lt;/p&gt;
&lt;p&gt;In general, randomisation will be strictly better if you have two choices with overlapping intervals of opposite orientations. The best randomisation ratio will be the one that gives a bet with zero-length interval.&lt;/p&gt;
&lt;p&gt;Now let us reconsider the Ellsberg urn. We did say the urn can be a metaphor for various situations. Generally these situations will not be symmetrical. But, even in symmetrical scenarios, we should still re-think how we apply the &lt;a href=&quot;http://en.wikipedia.org/wiki/Principle_of_indifference&quot;&gt;principle of indifference&lt;/a&gt;. I argue that the underlying idea is really this: if our information has a symmetry, then our decisions should have that same symmetry. If we switch green and blue, our information about the Ellsberg urn doesn't change. The situation is indistinguishable, so we should behave the same way. It follows that we should be indifferent between a bet on green and a bet on blue. Then, for the Bayesian, it follows that P(red) = P(green) = P(blue) = 1/3. Period.&lt;/p&gt;
&lt;p&gt;But for us, there is a degree of freedom, even in this symmetrical situation. We know what the probability of red is, so of course P(red) = 1/3 exactly. But we can set, say&lt;sup&gt;7&lt;/sup&gt;, P(green) = 1/3 +- 1/9, and P(blue) = 1/3 -+ 1/9. So we get P(red or green) = 2/3 +- 1/9, P(red or blue) = 2/3 -+ 1/9, P(green or blue) = 2/3 exactly, and of course P(red or green or blue) = 1 exactly.&lt;/p&gt;
&lt;p&gt;So: red is 1/3 exactly, but the minimum of green is 2/9. (green or blue) is 2/3 exactly, but the minimum of (red or blue) is 5/9. So choose red over green, and (green or blue) over (red or blue). That's the paradoxical behaviour. Note that neither pair of choices offered in the Ellsberg paradox has the type of overlap that favours randomisation.&lt;/p&gt;
&lt;p&gt;Once we have a decision procedure for the two-outcome case, then we can tack on any utility function, as I explained in the previous post. The result here is what you would expect: we get oriented expected utility intervals, obtained by multiplying the oriented probability intervals by the utility. When deciding, we pick the one whose interval has the largest minimum. So for example, a bet which pays 15U on red (using U for &quot;utils&quot;, the abstract units of measurement of the utility function) has expected utility 5U exactly. A bet which pays 18U on green has expected utility 6U +- 2U, the minimum is 4U. So pick the bet on red over that.&lt;/p&gt;
&lt;p&gt;Operationally, probability is associated with the &quot;fair price&quot; at which we are willing to bet. A probability interval indicates that there is no fair price. Instead we have a spread: we buy bets at their low price and sell at their high price. At least, we do that if we have no outstanding bets, or more generally, if the expected utility interval on our outstanding bets has zero-length. The &lt;strong&gt;second caveat&lt;/strong&gt; is that if this interval has length, then it affects our price: we also sell bets of the same orientation at their low price, and buy bets of the opposite orientation at their high price, until the length of this interval is used up. The midpoint of the expected utility interval on our outstanding bets will be irrelevant.&lt;/p&gt;
&lt;p&gt;This can be confusing, so it's time for an analogy.&lt;/p&gt;
&lt;h2&gt;Bootsianism&lt;/h2&gt;
&lt;p&gt;If you are Bayesian and risk-neutral (and if bets pay in &quot;utils&quot; rather than cash, you are risk-neutral by definition) then outstanding bets have no effect on further betting behaviour. However, if you are risk-averse, as is the most common case, then this is no longer true. The more money you've already got on the line, the less willing you will be to bet.&lt;/p&gt;
&lt;p&gt;But besides risk attitude, there could also be interference effects from non-monetary payouts. For example, if you are dealing in boots, then you wouldn't buy a single boot for half the price of a pair, and neither would you sell one of your boots for half the price of a pair. Unless you happened to already have unmatched boots, then you would sell those at a lower price, or buy boots of the opposite orientation at a higher price, until you had no more unmatched boots. If you were otherwise risk-neutral with respect to boots, then your behaviour would not depend on the number of pairs you have, just on the number and orientation of your unmatched boots.&lt;/p&gt;
&lt;p&gt;This closely resembles the non-Bayesian behaviour above. In fact, for the Ellsberg urn, we could just say that a bet on red is worth a pair of boots, a bet on green is worth two left boots, and a bet on blue is worth two right boots. Without saying anything further, it's clear that we would strictly prefer red (a pair) over green (two lefts), but we would also strictly prefer green-or-blue (two pairs) over red-or-blue (one left and three rights). That's the paradoxical behaviour, but you know you can't money-pump boots.&lt;/p&gt;
&lt;table bgcolor=&quot;yellow&quot; border=&quot;0&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;blockquote&gt;&lt;strong&gt;A&lt;/strong&gt;: I'll buy that pair of boots for 30 &lt;a href=&quot;http://nethack.wikia.com/wiki/Zorkmid&quot;&gt;zorkmids&lt;/a&gt;. &lt;br&gt; &lt;strong&gt;B&lt;/strong&gt;: Okay, here's your pair of boots. &lt;br&gt; &lt;strong&gt;A&lt;/strong&gt;: And here's your 30 zorkmids. Thank you. &lt;br&gt; &lt;strong&gt;B&lt;/strong&gt;: Thank you. Say, didn't you just buy an identical pair this morning? &lt;br&gt; &lt;strong&gt;A&lt;/strong&gt;: Yeah, I did. Then a dingo ate the right one. I've got the left one here. Never worn. &lt;br&gt; &lt;strong&gt;B&lt;/strong&gt;: How narratively convenient! How much would you sell it for? &lt;br&gt; &lt;strong&gt;A&lt;/strong&gt;: Hmm, how about 10 zorkmids? &lt;br&gt; &lt;strong&gt;B&lt;/strong&gt;: Really, 10 zorkmids? So, do you think right boots are more valuable than left boots?&lt;br&gt; &lt;strong&gt;A&lt;/strong&gt;: No, of course not. Why?&lt;br&gt; &lt;strong&gt;B&lt;/strong&gt;: Arbitrage!&lt;br&gt; &lt;strong&gt;A&lt;/strong&gt;: Gesundheit.&lt;br&gt; &lt;strong&gt;B&lt;/strong&gt;: Thanks. I'll buy a left boot from you for 10 zorkmids. &lt;br&gt; &lt;strong&gt;A&lt;/strong&gt;: Great! Here's your left boot. &lt;br&gt; &lt;strong&gt;B&lt;/strong&gt;: And here's your 10 zorkmids. Thank you. &lt;br&gt; &lt;strong&gt;A&lt;/strong&gt;: Thank &lt;em&gt;you&lt;/em&gt;! &lt;br&gt; &lt;strong&gt;B&lt;/strong&gt;: And I'll buy a right boot from you for 10 zorkmids. &lt;br&gt; &lt;strong&gt;A&lt;/strong&gt;: Errrm... Sorry? Why would I agree to that?&lt;br&gt; &lt;strong&gt;B&lt;/strong&gt;: You just sold me a left boot for 10 zorkmids. Well, you yourself said rights aren't more valuable than lefts. So, logically, you should be willing to sell me a right boot for 10 zorkmids.&lt;br&gt; &lt;strong&gt;A&lt;/strong&gt;: What? No. &lt;br&gt;&lt;/blockquote&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2&gt;Boots' rule&lt;/h2&gt;
&lt;p&gt;So much for the &lt;em&gt;static&lt;/em&gt; case. But what do we do with new information? How do we handle conditional probabilities?&lt;/p&gt;
&lt;p&gt;We still get P(A|B) by dividing P(A and B) by P(B). It will be easier to think in terms of pairs here. So for example P(red) = 1/3 exactly = (1/3,1/3) and P(red or green) = 2/3 +- 1/9 = (7/9,5/9), so P(red|red or green) = (3/7,3/5) = 18/35 -+ 3/35. And similarly P(green|red or green) = (1/3 +- 1/9)/(2/3 +- 1/9) = 17/35 +- 3/35.&lt;/p&gt;
&lt;p&gt;This rule covers the dynamic &lt;em&gt;passive&lt;/em&gt; case, where we update probabilities based on what we observe, before betting. The &lt;strong&gt;third and final caveat&lt;/strong&gt; is in the &lt;em&gt;active&lt;/em&gt; case, when information comes in between bets. Now, we saw that the length and orientation of the interval on expected utility of outstanding bets affects further betting behaviour. There is actually a separate update rule for this quantity. It is about as simple as it gets: &lt;em&gt;do nothing&lt;/em&gt;. The interval can change when we make choices, and its midpoint can shift due to external events, but its length and orientation do not update.&lt;/p&gt;
&lt;p&gt;You might expect the update rule for this quantity to follow from the way the expected utility updates, which follows from the way probability updates. But it has a mind of its own. So even if we are keeping track of our bets, we'd still need to keep track of this extra variable separately.&lt;/p&gt;
&lt;p&gt;Sometimes it may be easier to think in terms of the total expected utility interval of our outstanding bets, but sometimes it may be easier to think of this in terms of having a &quot;virtual&quot; interval that cancels the change in the length and orientation of the &quot;real&quot; expected utility interval. The midpoint of this virtual interval is irrelevant and can be taken to always be zero. So, on update, compute the prior expected utility interval of outstanding bets, subtract the posterior expected utility interval from it, and add this difference to the virtual interval. Reset its midpoint to zero, keeping only the length and orientation.&lt;/p&gt;
&lt;p&gt;That can also be confusing, so let's have another analogy.&lt;/p&gt;
&lt;h2&gt;Yo' mama's so illogical...&lt;/h2&gt;
&lt;p&gt;I recently came across this example by Mark Machina:&lt;/p&gt;
&lt;table bgcolor=&quot;yellow&quot; border=&quot;0&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;blockquote&gt;&lt;strong&gt;M&lt;/strong&gt;: Children, I only have one treat, I can only give it to one of you.&lt;br&gt; &lt;strong&gt;I&lt;/strong&gt;: Me, mama!&lt;br&gt; &lt;strong&gt;J&lt;/strong&gt;: No, give it to me!&lt;br&gt; &lt;strong&gt;M&lt;/strong&gt;: No. Rather than give it to either of you, it's better if I toss a coin. Heads, it goes to Irina, tails, it goes to Joey.&lt;br&gt; ...&lt;br&gt; &lt;strong&gt;M&lt;/strong&gt;: Heads. Irina gets it.&lt;br&gt; &lt;strong&gt;J&lt;/strong&gt;: But mama!&lt;br&gt; &lt;strong&gt;M&lt;/strong&gt;: Fair is fair.&lt;br&gt; &lt;strong&gt;I&lt;/strong&gt;: Yeah Joey!&lt;br&gt; &lt;strong&gt;J&lt;/strong&gt;: But mama, you yourself said it's better to toss a coin than to give it to either of us. So, logically, instead of giving it to Irina you should toss a coin again.&lt;br&gt; &lt;strong&gt;M&lt;/strong&gt;: Nice try, Joey. &lt;br&gt;&lt;/blockquote&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Instead of giving the treat to either child, she strictly prefers to toss a coin and give the treat to the winner. But after the coin is tossed, she strictly prefers to give the treat to the winner rather than toss again.&lt;/p&gt;
&lt;p&gt;This cannot be explained in terms of maximising expected utility, in the typical sense of &quot;utility&quot;. And of course only known probabilities are involved here, so there's no question as to whether her beliefs are probabilistically sophisticated or not. But it could be said that she is still maximising the expected value of an extended objective function. This extended objective function does not just consider who gets a treat, but also considers who &quot;had a fair chance&quot;. She is unfair if she gives the treat to either child outright, but fair if she tosses a coin. That fairness doesn't go away when the result of the coin toss is known.&lt;/p&gt;
&lt;p&gt;Or something like that. There are surely other ways of dissecting the mother's behaviour. But no matter what, it's going to have to take the coin toss into account, even though the coin, in and of itself, has no relevance to the situation.&lt;/p&gt;
&lt;p&gt;Let's go back to the urn. Green and blue have the type of overlap that favours randomisation: P((green and heads) or (blue and tails)) = 1/3 exactly. A bet paying 9U on this event has expected utility of 3U exactly. Let's say we took this bet. Now say the coin comes up heads. We can update the probabilities as per above. The answer is that P(green) = 1/3 +- 1/9 as it was before. That makes sense because it's an independent event: knowing the result of the coin toss gives no information about the urn. The difference is that we now have an outstanding bet that pays 9U if the ball is green. The expected utility would therefore be 3U +- 1U. Except, the expected utility interval was zero-length before the coin was tossed, so it remains zero-length. Equivalently, the virtual interval becomes -+ 1U, so that the effective total is 3U exactly. (In this example, the midpoint of the expected utility interval didn't change either. That's not generally the case.) A bet randomised on a &lt;em&gt;new&lt;/em&gt; coin toss would have expected utility 3U, plus the virtual interval of -+ 1U, for an effective total of 3U -+ 1U. So we would strictly prefer to keep the bet on green rather than re-randomise.&lt;/p&gt;
&lt;p&gt;Let's compare this with a trivial example: let's say we took a bet that pays 9U if the ball drawn from the urn is green. The expected utility of this bet is 3U +- 1U. For some unrelated reason, a coin is tossed, and it comes up heads. The coin has also nothing to do with the urn or my bet. I still have a bet of 9U on green, and its expected utility is still 3U +- 1U.&lt;/p&gt;
&lt;p&gt;But the difference between these two examples is just in the counterfactual: if the coin had come up tails, in the first example I would have had a bet of 9U on blue, and in the second example I would have had a bet of 9U on green. But the coin came up heads, and in both examples I end up with a bet of 9U on green. The virtual interval has some spooky dependency on what &lt;em&gt;could&lt;/em&gt; have happened, just like &quot;had a fair chance&quot;. It is the ghost of a departed bet.&lt;/p&gt;
&lt;p&gt;I expect many on LW are wondering &lt;a href=&quot;http://tvtropes.org/pmwiki/pmwiki.php/Main/WheresTheKaboom&quot;&gt;what happened&lt;/a&gt;. There was supposed to be a proof that anything that isn't Bayesian can be punished. Actually, this threat comes with some hidden assumptions, which I hope these analogies have helped to illustrate. A boot is an example of something which has no fair price, even if a pair of boots has one. A mother with two children and one treat is an example where some counterfactuals are not forgotten. The hidden assumptions fail in our case, just as they can fail in these other contexts where Bayesianism is not at issue. This can be stated more rigorously&lt;sup&gt;8&lt;/sup&gt;, but that is basically how it's possible. Now We Know. &lt;a href=&quot;http://tvtropes.org/pmwiki/pmwiki.php/Main/AndKnowingIsHalfTheBattle&quot;&gt;And Knowing is Half the Battle&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Notes&lt;/h2&gt;
&lt;p&gt;&lt;span&gt; &lt;ol&gt;
&lt;li&gt; Taken almost verbatim from Eliezer Yudkowsky's &lt;a href=&quot;/lw/my/the_allais_paradox/&quot;&gt;post on the Allais paradox&lt;/a&gt;. &lt;/li&gt;
&lt;li&gt; And footnotes pointing to some tangentially relevant journal articles make me sound extra credible. &lt;/li&gt;
&lt;li&gt; For Choquet expected utility see: D. Schmeidler, &lt;em&gt;Subjective probability and expected utility without additivity&lt;/em&gt;, Econometrica 57 (1989) pp 571-587.&lt;/li&gt;
&lt;li&gt; For maxmin expected utility see: I. Gilboa and D. Schmeidler, &lt;em&gt;Maxmin expected utility with a non-unique prior&lt;/em&gt;, J. Math. Econ. 18 (1989) pp 141-153.&lt;/li&gt;
&lt;li&gt; For multiplier preferences see: L.P. Hansen and T.J. Sargeant, &lt;em&gt;Robust control and model uncertainty&lt;/em&gt;, Amer. Econ. Rev. 91 (2001) pp 60-66.&lt;/li&gt;
&lt;li&gt; For variational preferences see: F. Maccheroni, M. Marinacci, and A. Rustichini, &lt;em&gt;Dynamic variational preferences&lt;/em&gt;, J. Econ. Theory 128 (2006) pp 4-44.&lt;/li&gt;
&lt;li&gt; Any length between 0 and 1/3 works. But here's where I pulled 1/9 from: a Bayesian might assign exactly 1/61 prior probability to the 61 possible urn compositions, and the result is roughly approximated by the &lt;a href=&quot;http://en.wikipedia.org/wiki/Rule_of_succession&quot;&gt;Laplacian rule of succession&lt;/a&gt;, which prescribes a &lt;a href=&quot;http://en.wikipedia.org/wiki/Pseudocount&quot;&gt;pseudocount&lt;/a&gt; of one green and one blue ball. A similar thing with probability intervals is roughly approximated by using a pseudocount of 3/2 +- 1/2 green and 3/2 -+ 1/2 blue balls. &lt;/li&gt;
&lt;li&gt; To quickly relate this back to &lt;a href=&quot;/lw/9e4/the_savage_theorem_and_the_ellsberg_paradox&quot;&gt;Savage's rules&lt;/a&gt;: rules 1 and 3 guarantee that there's no &lt;em&gt;static&lt;/em&gt; money pump. Rule 2 then is supposed to guarantee that there is no &lt;em&gt;dynamic&lt;/em&gt; money pump. But it is stronger than necessary for that purpose. I claim that this method obeys rules 1, 3, and a weaker version of rule 2, and that it is &lt;em&gt;dynamically consistent&lt;/em&gt;. For dynamic consistency of variational preferences in general, see footnotes above. This method is a special case, for which I wrote up a &lt;a href=&quot;http://sites.google.com/site/dmehkeri/home/dummies.pdf&quot;&gt;simpler proof&lt;/a&gt;. &lt;/li&gt;
&lt;/ol&gt;&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;Appendix A: method summary&lt;/h2&gt;
&lt;table bgcolor=&quot;yellow&quot; border=&quot;0&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;span&gt;
&lt;ul&gt;
&lt;li&gt; Events are assigned a pair of prior probabilities, which can also be thought of as an oriented probability interval. e.g. (3/5,2/5) can also be thought of as 1/2 +- 1/10. &lt;/li&gt;
&lt;li&gt; Neither side of the pair can be 0 or 1, except when they're both 0 or both 1. &lt;/li&gt;
&lt;li&gt; Each side of the pair is additive: if A and B are disjoint, and P(A) = (x,y), and P(B) = (u,v), then P(A or B) = (x+u,y+v). &lt;/li&gt;
&lt;li&gt; Each side of the pair updates by Bayes' rule: if P(A and B) = (x,y), and P(B) = (u,v), then P(A|B) = (x/u,y/v). &lt;/li&gt;
&lt;li&gt; Given a utility function, each bet will then have an expected utility interval: multiply the probability intervals by the utility for each possible outcome.&lt;/li&gt;
&lt;li&gt; There is also a virtual expected utility interval to keep track of. The midpoint of this interval is always zero. &lt;/li&gt;
&lt;li&gt; When updating the virtual expected utility interval, compute the prior expected utility interval of the outstanding bet(s), subtract the posterior expected utility interval from it, and add this difference to the virtual expected utility interval. Throw away the midpoint (reset the midpoint of the interval to zero, keeping just the length and orientation). &lt;/li&gt;
&lt;li&gt; To decide among bets: compute the expected utility intervals of each of them -- including already outstanding bets, and including the virtual expected utility interval. Rank them according to the minimum values of the intervals. &lt;/li&gt;
&lt;li&gt; Implicitly when presented with options we are also presented with the option to randomise among them, and sometimes this is strictly better than any of the pure options. &lt;/li&gt;
&lt;/ul&gt;
&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2&gt;Appendix B: obligatory image for LW posts on this topic&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;http://sites.google.com/site/dmehkeri/home/AllYourBayes.PNG&quot; alt=&quot;All your Bayes are belong to us&quot;&gt;&lt;/p&gt;&lt;/div&gt;
&lt;a href="http://lesswrong.com/lw/9m3/the_ellsberg_paradox_and_money_pumps/#comments"&gt;72 comments&lt;/a&gt;
</description>
</item>
<item>
<title>The Savage theorem and the Ellsberg paradox</title>
<link>http://lesswrong.com/lw/9e4/the_savage_theorem_and_the_ellsberg_paradox/</link>
<guid isPermaLink="true">http://lesswrong.com/lw/9e4/the_savage_theorem_and_the_ellsberg_paradox/</guid>
<pubDate>Sun, 15 Jan 2012 06:06:53 +1100</pubDate>
<description>
Submitted by &lt;a href="http://lesswrong.com/user/fool"&gt;fool&lt;/a&gt;
&amp;bull;
12 votes
&amp;bull;
&lt;a href="http://lesswrong.com/lw/9e4/the_savage_theorem_and_the_ellsberg_paradox/#comments"&gt;54 comments&lt;/a&gt;
&lt;div&gt;&lt;p&gt;&lt;strong&gt;Followup to&lt;/strong&gt;: &lt;a href=&quot;/lw/5te/a_summary_of_savages_foundations_for_probability&quot;&gt;A summary of Savage's foundation for probability and utility.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In 1961, &lt;a href=&quot;http://en.wikipedia.org/wiki/Daniel_Ellsberg&quot;&gt;Daniel Ellsberg&lt;/a&gt;, most famous for leaking the &lt;a href=&quot;http://en.wikipedia.org/wiki/Pentagon_Papers&quot;&gt;Pentagon Papers&lt;/a&gt;, published the decision-theoretic paradox which is now named after him &lt;sup&gt;1&lt;/sup&gt;. It is a cousin to the Allais paradox. They both involve violations of an independence or separability principle. But they go off in different directions: one is a violation of expected utility, while the other is a violation of subjective probability. The Allais paradox has been &lt;a href=&quot;/lw/my/the_allais_paradox/&quot;&gt;discussed on LW before&lt;/a&gt;, but when I do a search it seems that the first discussion of the Ellsberg paradox on LW was my comments on the previous post &lt;sup&gt;2&lt;/sup&gt;. It seems to me that from a Bayesian point of view, the Ellsberg paradox is the greater evil.&lt;/p&gt;
&lt;p&gt;But I should first explain what I mean by a violation of expected utility versus subjective probability, and for that matter, what I mean by Bayesian. I will explain a special case of Savage's representation theorem, which focuses on the subjective probability side only. Then I will describe Ellsberg's paradox. In the next episode, I will give an example of how not to be Bayesian. If I don't get voted off the island at the end of this episode.&lt;/p&gt;
&lt;h2&gt;Rationality and Bayesianism&lt;/h2&gt;
&lt;p&gt;Bayesianism is often taken to involve the maximisation of expected utility with respect to a subjective probability distribution. I would argue this label only sticks to the subjective probability side. But mainly, I wish to make a clear division between the two sides, so I can focus on one.&lt;/p&gt;
&lt;p&gt;Subjective probability and expected utility are certainly related, but they're still independent. You could be perfectly willing and able to assign belief numbers to all possible events as if they were probabilities. That is, your belief assignment obeys all the laws of probability, including Bayes' rule, which is, after all, what the -ism is named for. You could do all that, but still maximise something other than expected utility. In particular, you could combine subjective probabilities with prospect theory, which has also been &lt;a href=&quot;/lw/6kf/prospect_theory_a_framework_for_understanding/&quot;&gt;discussed on LW before&lt;/a&gt;. In that case you may display Allais-paradoxical behaviour but, as we will see, not Ellsberg-paradoxical behaviour. The rationalists might excommunicate you, but it seems to me you should keep your Bayesianist card.&lt;/p&gt;
&lt;p&gt;On the other hand your behaviour could be incompatible with any subjective probability distribution. But you could still maximise utility with respect to something other than subjective probability. In particular, when faced with known probabilities, you would be maximising expected utility in the normal sense. So you can not exhibit any Allais-paradoxical behaviour, because the Allais paradox involves only objective lotteries. But you may exhibit, as we will see, Ellsberg-paradoxical behaviour. I would say you are not Bayesian.&lt;/p&gt;
&lt;p&gt;So a non-Bayesian, even the strictest frequentist, can still be an expected utility maximiser, and a perfect Bayesian need not be an expected utility maximiser. What I'm calling Bayesianist is just the idea that we should reason with our subjective beliefs the same way that we reason with objective probabilities. This also has been called having &quot;probabilistically sophisticated&quot; beliefs, if you prefer to avoid the B-word, or don't like the way I'm using it.&lt;/p&gt;
&lt;p&gt;In a lot of what follows, I will bypass utility by only considering two outcomes. Utility functions are only unique up to a constant offset and a positive scale factor. With two outcomes, they evaporate entirely. The question of maximising expected utility with respect to a subjective probability distribution reduces to the question of maximising the probability, according to that distribution, of getting the better of the two outcomes. (And if the two outcomes are equal, there is nothing to maximise.)&lt;/p&gt;
&lt;p&gt;And on the flip side, if we have a decision method for the two-outcome case, Bayesian or otherwise, then we can always tack on a utility function. The idea of utility is just that any intermediate outcome is equivalent to an objective lottery between better and worse outcomes. So if we want, we can use a utility function to reduce a decision problem with any (finite) number of outcomes to a decision problem over the best and worst outcomes in question.&lt;/p&gt;
&lt;h2&gt;Savage's representation theorem&lt;/h2&gt;
&lt;p&gt;Let me recap some of the &lt;a href=&quot;/lw/5te/a_summary_of_savages_foundations_for_probability/&quot;&gt;previous post&lt;/a&gt; on Savage's theorem. How might we defend Bayesianism? We could invoke &lt;a href=&quot;http://en.wikipedia.org/wiki/Cox%20%27s_theorem&quot;&gt;Cox's theorem&lt;/a&gt;. This starts by assuming possible events can be assigned real numbers corresponding to some sort of belief level on someone's part, and that there are certain functions over these numbers corresponding to logical operations. It can be proven that, if someone's belief functions obey some simple rules, then that person acts &lt;em&gt;as if&lt;/em&gt; they were reasoning with subjective probability. Now, while the rules for belief functions are intuitive, the background assumptions are pretty sketchy. It is not at all clear why these mathematical constructs are requirements of rationality.&lt;/p&gt;
&lt;p&gt;One way to justify those constructs is to argue in terms of choices a rational person must make. We imagine someone is presented with choices among various bets on uncertain events. Their level of belief in these events can be gauged by which bets they choose. But if we're going to do that anyway, then, as it turns out, we can just give some simple rules directly about these choices, and bypass the belief functions entirely. This was Leonard Savage's approach &lt;sup&gt;3&lt;/sup&gt;. To quote a comment on the previous post: &quot;This is important because agents in general don't have to use beliefs or goals, but they do all have to choose actions.&quot;&lt;/p&gt;
&lt;p&gt;Savage's approach actually covers both subjective probability and expected utility. The previous post discusses both, whereas I am focusing on the former. This lets me give a shorter exposition, and I think a clearer one.&lt;/p&gt;
&lt;p&gt;We start by assuming some abstract collection of possible bets. We suppose that when you are offered two bets from this collection, you will choose one over the other, or express indifference.&lt;/p&gt;
&lt;p&gt;As discussed, we will only consider two outcomes. So all bets have the same payout, the difference among them is just their winning conditions. It is not specified what it is that you win. But it is assumed that, given the choice between winning unconditionally and losing unconditionally, you would choose to win.&lt;/p&gt;
&lt;p&gt;It is assumed that the collection of bets form what is called a &lt;a href=&quot;http://en.wikipedia.org/wiki/Boolean_algebra_%20%28structure%29&quot;&gt;boolean algebra&lt;/a&gt;. This just means we can consider combinations of bets under boolean operators like &quot;and&quot;, &quot;or&quot;, or &quot;not&quot;. Here I will use brackets to indicate these combinations. (A or B) is a bet that wins under the conditions that make either A win, or B win, or both win. (A but not B) wins whenever A wins but B doesn't. And so on.&lt;/p&gt;
&lt;p&gt;If you are rational, your choices must, it is claimed, obey some simple rules. If so, it can be proven that you are choosing &lt;em&gt;as if&lt;/em&gt; you had a assigned subjective probabilities to bets. Savage's axioms for choosing among bets are &lt;sup&gt;4&lt;/sup&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;If you choose A over B, you shall not choose B over A; and, if you do not choose A over B, and do not choose B over C, you shall not choose A over C. &lt;/li&gt;
&lt;li&gt;If you choose A over B, you shall also choose (A but not B) over (B but not A); and conversely, if you choose (A but not B) over (B but not A), you shall also choose A over B. &lt;/li&gt;
&lt;li&gt;You shall not choose A over (A or B).&lt;/li&gt;
&lt;li&gt;If you choose A over B, then you shall be able to specify a finite sequence of bets C&lt;sub&gt;1&lt;/sub&gt;, C&lt;sub&gt;2&lt;/sub&gt;, ..., C&lt;sub&gt;n&lt;/sub&gt;, such that it is guaranteed that one and only one of the C's will win, and such that, for any one of the C's, you shall still choose (A but not C) over (B or C).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Rule 1 is a coherence requirement on rational choice. It is requires your preferences to be a &lt;a href=&quot;http://en.wikipedia.org/wiki/Strict_weak_ordering#Total_preorders&quot;&gt;total pre-order&lt;/a&gt;. One objection to Cox's theorem is that&amp;#xA0;levels of belief could be incomparable. This objection does not apply to rule 1 in this context because, as we discussed above, we're talking about choices of bets, not beliefs. Faced with choices, we choose. A rational person's choices must be non-circular.&lt;/p&gt;
&lt;p&gt;Rule 2 is an independence requirement. It demands that when you compare two bets, you ignore the possibilty that they could both win. In those circumstances you would be indifferent between the two anyway. The only possibilities that are relevant to the comparison are the ones where one bet wins and the other doesn't. So, you ought to compare A to B the same way you compare (A but not B) to (B but not A). Savage called this rule the Sure-thing principle.&lt;/p&gt;
&lt;p&gt;Rule 3 is a dominance requirement on rational choice. It demands that you not choose something that cannot do better under any circumstance: whenever A would win, so would (A or B). Note that you might judge (B but not A) to be impossible a priori. So, you might legitimately express indifference between A and (A or B). We can only say it is never legitimate to choose A over (A or B).&lt;/p&gt;
&lt;p&gt;Rule 4 is the most complicated. Luckily it's not going to be relevant to the Ellsberg paradox. Call it Mostly Harmless and forget this bit if you want.&lt;/p&gt;
&lt;p&gt;What rule 4 says is that if you choose A over B, you must be willing to pay a premium for your choice. Now, we said there are only two outcomes in this context. Here, the premium is paid in terms of other bets. Rule 4 demands that you give a finite list of &lt;a href=&quot;http://en.wikipedia.org/wiki/Mutually_exclusive_events&quot;&gt;mutually exclusive&lt;/a&gt; and &lt;a href=&quot;http://en.wikipedia.org/wiki/Collectively_exhaustive_events&quot;&gt;exhaustive&lt;/a&gt; events, and still be willing to choose A over B if we take any event on your list, cut it from A, and paste it to B. You can list as many events as you need to, but it must be a finite list.&lt;/p&gt;
&lt;p&gt;For example, if you thought A was much more likely than B, you might pull out a die, and list the 6 possible outcomes of one roll. You would also be willing to choose (A but not a roll of 1) over (B or a roll of 1), (A but not a roll of 2) over (B or a roll of 2), and so on. If not, you might list the 36 possible outcomes of two consecutive rolls, and be willing to choose (A but not two rolls of 1) over (B or two rolls of 1), and so on. You could go to any finite number of rolls.&lt;/p&gt;
&lt;p&gt;In fact rule 4 is pretty liberal, it doesn't even demand that every event on your list be equiprobable, or even independent of the A and B in question. It just demands that the events be mutually exclusive and exhaustive. If you are not willing to specify &lt;em&gt;some&lt;/em&gt; such list of events, then you ought to express indifference between A and B.&lt;/p&gt;
&lt;p&gt;If you obey rules 1-3, then that is sufficient for us construct a sort of qualitative subjective probability out of your choices. It might not be quantitative: for one thing, there could be &lt;a href=&quot;http://en.wikipedia.org/wiki/Infinitessimal&quot;&gt;infinitessimally&lt;/a&gt; likely beliefs. Another thing is that there might be more than one way to assign numbers to beliefs. Rule 4 takes care of these things. If you obey rule 4 also, then we can assign a subjective probability to every possible bet, prove that you choose among bets &lt;em&gt;as if&lt;/em&gt; you were using those probabilities, and also prove that it is the only probability assignment that matches your choices. And, on the flip side, if you are choosing among bets based on a subjective probability assignment, then it is easy to prove you obey rules 1-3, as well as rule 4 if the collection of bets is suitably infinite, like if a fair die is avaialble to bet on.&lt;/p&gt;
&lt;p&gt;Savage's theorem is impressive. The background assumptions involve just the concept of choice, and no numbers at all. There are only a few simple rules. Even rule 4 isn't really all that hard to understand and accept. A subjective probability distribution appears seemingly out of nowhere. In the full version, a utility function appears out of nowhere too. This theorem has been called the crowning glory of decision theory.&lt;/p&gt;
&lt;h2&gt;The Ellsberg paradox&lt;/h2&gt;
&lt;p&gt;Let's imagine there is an urn containing 90 balls. 30 of them are red, and the other 60 are either green or blue, in unknown proportion. We will draw a ball from the urn at random. Let us bet on the colour of this ball. As above, all bets have the same payout. To be specific, let's say you get pie if you win, and a &lt;a href=&quot;http://en.wikipedia.org/wiki/Boot_to_the_head&quot;&gt;boot to the head&lt;/a&gt; if you lose. The first question is: do you prefer to bet that the colour will be red, or that it will be green? The second question is: do you prefer to bet that it will be (red or blue), or that it will be (green or blue)?&lt;/p&gt;
&lt;p&gt;The most common response&lt;sup&gt;5&lt;/sup&gt; is to choose red over green, and (green or blue) over (red or blue). And that's all there is to it. Paradox! &lt;sup&gt;6&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;
&lt;table cellpadding=&quot;5&quot; cellspacing=&quot;0&quot; border=&quot;0&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td rowspan=&quot;2&quot;&gt;&amp;#xA0;&lt;/td&gt;
&lt;td bgcolor=&quot;red&quot; align=&quot;center&quot;&gt;&lt;span style=&quot;font-size: xx-small;&quot;&gt;30&lt;/span&gt;&lt;/td&gt;
&lt;td colspan=&quot;2&quot; align=&quot;center&quot; bgcolor=&quot;cyan&quot;&gt;&lt;span style=&quot;font-size: xx-small;&quot;&gt;60&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td bgcolor=&quot;red&quot; align=&quot;center&quot;&gt;&lt;span style=&quot;font-size: xx-small;&quot;&gt;&lt;strong&gt;Red&lt;/strong&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td bgcolor=&quot;green&quot; align=&quot;center&quot;&gt;&lt;span style=&quot;font-size: xx-small;&quot;&gt;&lt;strong&gt;Green&lt;/strong&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td bgcolor=&quot;blue&quot; align=&quot;center&quot;&gt;&lt;span style=&quot;font-size: xx-small;&quot;&gt;&lt;strong&gt;Blue&lt;/strong&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan=&quot;5&quot;&gt;
&lt;hr&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th&gt;A&lt;/th&gt;
&lt;td bgcolor=&quot;red&quot; align=&quot;center&quot;&gt;pie&lt;/td&gt;
&lt;td bgcolor=&quot;green&quot; align=&quot;center&quot;&gt;BOOT&lt;/td&gt;
&lt;td bgcolor=&quot;blue&quot; align=&quot;center&quot;&gt;BOOT&lt;/td&gt;
&lt;td rowspan=&quot;2&quot;&gt;&amp;#xA0;&lt;/td&gt;
&lt;td rowspan=&quot;2&quot;&gt;&lt;span&gt;&lt;em&gt;A is preferred to B&lt;/em&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th&gt;B&lt;/th&gt;
&lt;td bgcolor=&quot;red&quot; align=&quot;center&quot;&gt;BOOT&lt;/td&gt;
&lt;td bgcolor=&quot;green&quot; align=&quot;center&quot;&gt;pie&lt;/td&gt;
&lt;td bgcolor=&quot;blue&quot; align=&quot;center&quot;&gt;BOOT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan=&quot;5&quot;&gt;
&lt;hr&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th&gt;C&lt;/th&gt;
&lt;td bgcolor=&quot;red&quot; align=&quot;center&quot;&gt;pie&lt;/td&gt;
&lt;td bgcolor=&quot;green&quot; align=&quot;center&quot;&gt;BOOT&lt;/td&gt;
&lt;td bgcolor=&quot;blue&quot; align=&quot;center&quot;&gt;pie&lt;/td&gt;
&lt;td rowspan=&quot;2&quot;&gt;&amp;#xA0;&lt;/td&gt;
&lt;td rowspan=&quot;2&quot;&gt;&lt;span&gt;&lt;em&gt;D is preferred to C&lt;/em&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th&gt;D&lt;/th&gt;
&lt;td bgcolor=&quot;red&quot; align=&quot;center&quot;&gt;BOOT&lt;/td&gt;
&lt;td bgcolor=&quot;green&quot; align=&quot;center&quot;&gt;pie&lt;/td&gt;
&lt;td bgcolor=&quot;blue&quot; align=&quot;center&quot;&gt;pie&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td colspan=&quot;5&quot;&gt;&amp;#xA0;&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;span&gt;&lt;em&gt;Paradox!&lt;/em&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/p&gt;
&lt;p&gt;&amp;#xA0;&lt;/p&gt;
&lt;p&gt;If choices were based solely on an assignment of subjective probability, then because the three colours are mutually exclusive, P(red or blue) = P(red) + P(blue), and P(green or blue) = P(green) + P(blue). So, since P(red) &amp;gt; P(green) then P (red or blue) &amp;gt; P(green or blue), but instead we have P(red or blue) &amp;lt; P(green or blue).&lt;/p&gt;
&lt;p&gt;Knowing Savage's representation theorem, we expect to get a formal contradiction from the 4 rules above plus the 2 expressed choices. Something has to give, so we'd like to know which rules are really involved. You can see that we are talking only about rule 2, the Sure-thing principle. It says we shall compare (red or blue) to (green or blue) the same way as we compare red to green.&lt;/p&gt;
&lt;p&gt;This behaviour has been called &lt;a href=&quot;http://en.wikipedia.org/wiki/Ambiguity_aversion&quot;&gt;ambiguity aversion&lt;/a&gt;. Now, perhaps this is just a cognitive bias. It wouldn't be the first time that people behave a certain way, but the analysis of their decisions shows a clear error. And indeed, when explained, some people do repent of their sins against Bayes. They change their choices to obey rule 2. But others don't. To quote Ellsberg:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;...after rethinking all their 'offending' decisions in light of [Savage's] axioms, a number of people who are not only sophisticated but reasonable decide that they wish to persist in their choices. This includes people who previously felt a 'first order commitment' to the axioms, many of them surprised and some dismayed to find that they wished, in these situations, to violate the Sure-thing Principle. Since this group included L.J. Savage, when last tested by me (I have been reluctant to try him again), it seems to deserve respectful consideration.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I include myself in the group that thinks rule 2 is what should be dropped. But I don't have any dramatic (de-)conversion story to tell. I was somewhat surprised, but not at all dismayed, and I can't say I felt much if any prior commitment to the rules. And as to whether I'm sophisticated or reasonable, well never mind! Even if there are a number of other people who are all of the above, and even if Savage himself may have been one of them for a while, I do realise that smart people can be Just Plain Wrong. So I'd better have something more to say for myself.&lt;/p&gt;
&lt;p&gt;Well, red obviously has a probability of 1/3. Our best guess is to apply the &lt;a href=&quot;http://en.wikipedia.org/wiki/Principle_of_indifference&quot;&gt;principle of indifference&lt;/a&gt; to also assign probability 1/3 to green or blue. But our best guess is not necessarily a good guess. The probabilities we assign to red, and to (green or blue), are objective. We're guessing the probability of green, and of (red or blue). It seems wise to take this difference into account when choosing what to bet on, doesn't it? And surely it will be all the more wise when dealing with real-life, non- symetrical situations where we can't even appeal to the principle of indifference.&lt;/p&gt;
&lt;p&gt;Or maybe I'm just some fool talking &lt;a href=&quot;http://www.urbandictionary.com/define.php?term=jibba%20jabba&quot;&gt;jibba jabba&lt;/a&gt;. Against this sort of talk, the &lt;a href=&quot;/lw/my/the_allais_paradox/&quot;&gt;LW post on the Allais paradox&lt;/a&gt; presents a version of Howard Raiffa's dynamic inconsistency argument. This makes no references to internal thought processes, it is a purely &lt;em&gt;external&lt;/em&gt; argument about the decisions themselves. As stated in that post, &quot;There is always a price to pay for leaving the Bayesian Way.&quot; &lt;sup&gt;7&lt;/sup&gt; This is expanded upon in &lt;a href=&quot;/lw/mt/beautiful_probability/&quot;&gt;an earlier post&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Sometimes you must seek an approximation; often, indeed. This doesn't mean that probability theory has ceased to apply, any more than your inability to calculate the aerodynamics of a 747 on an atom-by-atom basis implies that the 747 is not made out of atoms. Whatever approximation you use, it works to the extent that it approximates the ideal Bayesian calculation - and fails to the extent that it departs.&lt;/p&gt;
&lt;p&gt;Bayesianism's coherence and uniqueness proofs cut both ways ... anything that is not Bayesian must fail one of the coherency tests. This, in turn, opens you to punishments like Dutch-booking (accepting combinations of bets that are sure losses, or rejecting combinations of bets that are sure gains).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Now even if you believe this about the Allais paradox, I've argued that this doesn't really have much to do with Bayesianism one way or the other. The Ellsberg paradox is what actually strays from the Path. So, does God also punish ambiguity aversion?&lt;/p&gt;
&lt;p&gt;Tune in next time&lt;sup&gt;8&lt;/sup&gt;, when I present a two-outcome decision method that obeys rules 1, 3, and 4, and even a weaker form of rule 2. But it exhibits ambiguity aversion, in gross violation of the original rule 2, so that it's not even approximately Bayesian. I will try to present it in a way that advocates for its internal cognitive merit. But the main thing &lt;sup&gt;9&lt;/sup&gt; is that, externally, it is dynamically consistent. We do not get booked, by the Dutch or any other nationality.&lt;/p&gt;
&lt;h2&gt;Notes&lt;/h2&gt;
&lt;p&gt;&amp;#xA0;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt; Ellsberg's original paper is: &lt;em&gt;Risk, ambiguity, and the Savage axioms&lt;/em&gt;, Quarterly Journal of Economics 75 (1961) pp 643-669&lt;/li&gt;
&lt;li&gt; Some discussion followed, in which I did rather poorly. Actually I had to admit defeat. Twice. But, as they say: fool me once, shame on me; fool me twice, won't get fooled again! &lt;/li&gt;
&lt;li&gt;Savage presents his theorem in his book: &lt;em&gt;The Foundations of Statistics&lt;/em&gt;, Wiley, New York, 1954. &lt;/li&gt;
&lt;li&gt; To compare to Savage's setup: for the two outcome case, we deal directly with &quot;actions&quot; or equivalently &quot;events&quot;, here called &quot;bets&quot;. We can dispense with &quot;states&quot;; in particular we don't have to demand that the collection of bets be &lt;a href=&quot;http://en.wikipedia.org/wiki/Sigma_algebra&quot;&gt;countably complete&lt;/a&gt;, or even a power-set algebra of states, just that it be some boolean algebra. Savage's axioms of course have a descriptive interpretation, but it is their normativity that is at issue here, so I state them as &quot;you shall&quot;. Rules 1-3 are his P1-P3, and 4 is P6. P4 and P7 are irrelevant in the two- outcome case. P5 is included in the background assumption that you would choose to win. I do not call this normative, because the payoff wasn't specified.&lt;/li&gt;
&lt;li&gt; Ellsberg originally proposed this just as a thought experiment, and canvassed various victims for their thoughts under what he called &quot;absolutely non-expiremental conditions&quot;. He used $100 and $0 instead of pie and a boot to the head. Which is dull of course, but it shouldn't make a difference&lt;sup&gt;10&lt;/sup&gt;. The experiment has since been repeated under more experimental conditions. The expirementers also invariably opt for the more boring cash payouts.&lt;/li&gt;
&lt;li&gt;Some people will say this isn't &quot;really&quot; a paradox. Meh.&lt;/li&gt;
&lt;li&gt;Actually, I inserted &quot;to pay&quot;. It wasn't in the original post. But it should have been.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://docs.google.com/open?id=0B4xY6UslgZW2YWI1ZjZjODUtMjM2NC00MDI3LWJlMTktYmI3ZDQ2ZWZkODgz&quot;&gt;Sneak preview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;As &lt;a href=&quot;http://en.wikipedia.org/wiki/Forrest_gump&quot;&gt;a great decision theorist&lt;/a&gt; once said, &quot;Stupid is as stupid does.&quot; &lt;/li&gt;
&lt;li&gt;...or should it? Savage's rule P4 demands that it shall not. And the method I have in mind obeys this rule. But it turns out this is another rule that God won't enforce. And that's yet another post, if I get to it at all. &lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&amp;#xA0;&lt;/p&gt;&lt;/div&gt;
&lt;a href="http://lesswrong.com/lw/9e4/the_savage_theorem_and_the_ellsberg_paradox/#comments"&gt;54 comments&lt;/a&gt;
</description>
</item>
<item>
<title>(Subjective Bayesianism vs. Frequentism) VS. Formalism </title>
<link>http://lesswrong.com/lw/8k9/subjective_bayesianism_vs_frequentism_vs_formalism/</link>
<guid isPermaLink="true">http://lesswrong.com/lw/8k9/subjective_bayesianism_vs_frequentism_vs_formalism/</guid>
<pubDate>Sat, 26 Nov 2011 16:05:41 +1100</pubDate>
<description>
Submitted by &lt;a href="http://lesswrong.com/user/potato"&gt;potato&lt;/a&gt;
&amp;bull;
27 votes
&amp;bull;
&lt;a href="http://lesswrong.com/lw/8k9/subjective_bayesianism_vs_frequentism_vs_formalism/#comments"&gt;106 comments&lt;/a&gt;
&lt;div&gt;&lt;p&gt;One of the core aims of the philosophy of probability is to explain the relationship between frequency and probability. The frequentist proposes identity as the relationship. This use of identity is highly dubious. We know how to check for identity between numbers, or even how to check for the weaker copula relation between particular objects; but how would we test the identity of frequency and probability? It is not immediately obvious that there is some simple value out there which is modeled by probability, like position and mass are values that are modeled by Newton's Principia. You can actually check if density * volume = mass, by taking separate measurements of mass, density and volume, but what would you measure to check a frequency against a probability?&lt;/p&gt;
&lt;p&gt;There are certain appeals to frequentest philosophy: we would like to say that if a bag has 100 balls in it, only 1 of which is white, then the probability of drawing the white ball is 1/100, and that if we take a non-white ball out, the probability of drawing the white ball is now 1/99. Frequentism would make the philosophical justification of that inference trivial. But of course, anything a frequentist can do, a Bayesian can do (better). I mean that literally: &lt;a href=&quot;/lw/21c/frequentist_magic_vs_bayesian_magic/&quot;&gt;it's the stronger magic&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A Subjective Bayesian, more or less, says that the reason frequencies are related to probabilities is because when you learn a frequency you thereby learn a fact about the world, and one must update one's degrees of belief on every available fact. The subjective Bayesian actually uses the copula in another strange way:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Probability is subjective degree of belief.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;and subjective Bayesians also claim:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Probabilities are not in the world, they are in your mind.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;These two statements are brilliantly championed in &lt;a href=&quot;/lw/s6/probability_is_subjectively_objective/&quot;&gt;Probability is Subjectively Objective&lt;/a&gt;. But ultimately, the formalism which I would like to suggest denies both of these statements. Formalists do not ontologically commit themselves to probabilities, just as they do not say that numbers exist; hence we don't allocate probabilities in the mind or anywhere else; we only commit ourselves to number theory, and probability theory. Mathematical theories are simply repeatable processes which construct certain sequences of squiggles called &quot;theorems&quot;, by changing the squiggles of other theorems, according to certain rules called &quot;inferences&quot;. Inferences always take as input certain sequences of squiggles called premises, and output a sequence of squiggles called the conclusion. The only thing an inference ever does is add squiggles to a theorem, take away squiggles from a theorem, or both. It turns out that these squiggle sequences mixed with inferences can talk about almost anything, certainly any computable thing. The formalist does not need to ontologically commit to numbers to assert that &quot;There is a prime greater than 10000.&quot;, even though &quot;There is x such that&quot; is a flat assertion of existence; because for the formalist &quot;There is a prime greater than 10000.&quot; simply means that number theory contains a theorem which is interpreted as &quot;there is a prime greater than 10000.&quot; When you say a mathematical fact in English, you are interpreting a theorem from a formal theory. If under your suggested interpretation, all of the theorems of the theory are true, then whatever system/mechanism your interpretation of the theory talks about, is said to be modeled by the theory.&lt;/p&gt;
&lt;p&gt;So, what is the relation between frequency and probability proposed by formalism? Theorems of probability, may be interpreted as true statements about frequencies, when you assign certain squiggles certain words and claim the resulting natural language sentence. Or for short we can say: &quot;Probability theory models frequency.&quot; It is trivial to show that Komolgorov models frequency, since it also models fractions; it is an algebra after all. More interestingly, probability theory models rational distributions of subjective degree of believe, and the optimal updating of degree of believe given new information. This is somewhat harder to show; &lt;a href=&quot;/lw/3cp/dutch_books_and_decision_theory_an_introduction/&quot;&gt;dutch-book arguments&lt;/a&gt; do nicely to at least provide some intuitive understanding of the relation between degree of belief, betting, and probability, but there is still work to be done here. If Bayesian probability theory really does model rational belief, which many believe it does, then that is likely the most interesting thing we are ever going to be able to model with probability. But probability theory also models spatial measurement? Why not add the position that probability &lt;strong&gt;is&lt;/strong&gt; volume to the debating lines of the philosophy of probability?&lt;/p&gt;
&lt;p&gt;Why are frequentism's and subjective Bayesianism's misuses of the copula not as obvious as &lt;em&gt;volumeism's&lt;/em&gt;? This is because what the Bayesian and frequentest are really arguing about is statistical methodology, they've just disguised the argument as an argument about &lt;em&gt;what probability is.&lt;/em&gt; Your interpretation of probability theory will determine how you model uncertainty, and hence determine your statistical methodology. Volumeism cannot handle uncertainty in any obvious way; however, the Bayesian and frequentest interpretations of probability theory, imply two radically different ways of handling uncertainty.&lt;/p&gt;
&lt;p&gt;The easiest way to understand the philosophical dispute between the frequentist and the subjective Bayesian is to look at the classic biased coin:&lt;/p&gt;
&lt;p style=&quot;padding-left: 30px;&quot;&gt;A subjective Bayesian and a frequentist are at a bar, and the bartender (being rather bored) tells the two that he has a biased coin, and asks them &quot;what is the probability that the coin will come up heads on the first flip?&quot; The frequentist says that for the coin to be biased means for it not have a 50% chance of coming up heads, so all we know is that it has a probability that is not equal 50%. The Bayesain says that that any evidence I have for it coming up heads, is also evidence for it coming up tails, since I know nothing about one outcome, that doesn't hold for its negation, and the only value which represents that symmetry is 50%.&lt;/p&gt;
&lt;p&gt;I ask you. What is the difference between these two, and the poor souls engaged in endless debate over realism about sound in the beginning of &lt;a href=&quot;/lw/i3/making_beliefs_pay_rent_in_anticipated_experiences/&quot;&gt;Making Beliefs Pay Rent&lt;/a&gt;?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;If a tree falls in a forest and no one hears it, does it make a sound? One says, &quot;Yes it does, for it makes vibrations in the air.&quot; Another says, &quot;No it does not, for there is no auditory processing in any brain.&quot;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;One is being asked: &quot;Are there pressure waves in the air if we aren't around?&quot; the other is being asked: &quot;Are there auditory experiences if we are not around?&quot; The problem is that &quot;sound&quot; is being used to stand for both &quot;auditory experience&quot; and &quot;pressure waves through air&quot;. They are both giving the right answers to these respective questions. But they are failing to &lt;a href=&quot;/lw/nv/replace_the_symbol_with_the_substance/&quot;&gt;Replace the Symbol with the Substance&lt;/a&gt; and &lt;a href=&quot;/lw/oc/variable_question_fallacies/&quot;&gt;they're using one word with two different meanings in different places&lt;/a&gt;. In the exact same way, &quot;probability&quot; is being used to stand for both &quot;frequency of occurrence&quot; and &quot;rational degree of belief&quot; in the dispute between the Bayesian and the frequentist. The correct answer to the question: &quot;If the coin is flipped an infinite amount of times, how frequently would we expect to see a coin that landed on heads?&quot; is &quot;All we know, is that it wouldn't be 50%.&quot; because that is what it means for the coin to be biased. The correct answer to the question: &quot;What is the optimal degree of belief that we should assign to the first trial being heads?&quot; is &quot;Precisely 50%.&quot;, because of the symmetrical evidential support the results get from our background information. How we should actually model the situation as statisticians depends on our goal. But remember that Bayesianism is the stronger magic, and the only contender for perfection in the competition.&lt;/p&gt;
&lt;p&gt;For us formalists, probabilities are not anywhere. We do not even believe in probability technically, we only believe in probability theory. The only coherent uses of &quot;probability&quot; in natural language are purely syncategorematic. We should be very careful when we colloquially use &quot;probability&quot; as a noun or verb, and be very careful and clear about what we mean by this word play. Probability theory models many things, including degree of belief, and frequency. Whatever we may learn about rationality, frequency, measure, or any of the other mechanisms that probability models, through the interpretation of probability theorems, we learn because probability theory is &lt;em&gt;isomorphic&lt;/em&gt; to those mechanisms. When you use the copula like the frequentist or the subjective Bayesian, it makes it hard to notice that probability theory modeling both frequency and degree of belief, is not a contradiction. If we use &quot;is&quot; instead of &quot;model&quot;, it is clear that frequency is not degree of belief, so if probability is belief, then it is not frequency.&amp;#xA0; Though frequency is not degree of belief, frequency does model degree of belief, so if probability models frequency, it must also model degree of belief.&lt;/p&gt;&lt;/div&gt;
&lt;a href="http://lesswrong.com/lw/8k9/subjective_bayesianism_vs_frequentism_vs_formalism/#comments"&gt;106 comments&lt;/a&gt;
</description>
</item>
<item>
<title>MSF Theory: Another Explanation of Subjectively Objective Probability</title>
<link>http://lesswrong.com/lw/6w8/msf_theory_another_explanation_of_subjectively/</link>
<guid isPermaLink="true">http://lesswrong.com/lw/6w8/msf_theory_another_explanation_of_subjectively/</guid>
<pubDate>Sun, 31 Jul 2011 05:46:56 +1000</pubDate>
<description>
Submitted by &lt;a href="http://lesswrong.com/user/potato"&gt;potato&lt;/a&gt;
&amp;bull;
13 votes
&amp;bull;
&lt;a href="http://lesswrong.com/lw/6w8/msf_theory_another_explanation_of_subjectively/#comments"&gt;11 comments&lt;/a&gt;
&lt;div&gt;&lt;p&gt;Before I read &lt;a href=&quot;/lw/oj/probability_is_in_the_mind/&quot; target=&quot;_blank&quot;&gt;Probability is in the Mind&lt;/a&gt; and &lt;a href=&quot;/lw/s6/probability_is_subjectively_objective/&quot; target=&quot;_blank&quot;&gt;Probability is Subjectively Objective&lt;/a&gt; I was a realist about probabilities; I was a frequentest. After I read them, I was just confused. I couldn't understand how a mind could accurately say the probability of getting a heart in a standard deck of playing cards was not 25%. It wasn't until I tried to explain the contrast between my view and the subjective view in a comment on &lt;a href=&quot;/lw/s6/probability_is_subjectively_objective&quot; target=&quot;_blank&quot;&gt;Probability is Subjectively Objective&lt;/a&gt; that I realized I was a subjective Bayesian all along. So, if you've read &lt;a href=&quot;/lw/oj/probability_is_in_the_mind&quot; target=&quot;_blank&quot;&gt;Probability is in the Mind&lt;/a&gt; and read &lt;a href=&quot;/lw/s6/probability_is_subjectively_objective&quot; target=&quot;_blank&quot;&gt;Probability is Subjectively Objective&lt;/a&gt; but still feel a little confused, hopefully, this will help.&lt;/p&gt;
&lt;p&gt;I should mention that I'm not sure that EY would agree with my view of probability, but the view to be presented agrees with EY's view on at least these propositions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Probability is always in a mind, not in the world.&lt;/li&gt;
&lt;li&gt;The probability that an agent should ascribe to a proposition is directly related to that agent's knowledge of the world.&lt;/li&gt;
&lt;li&gt;There is only one &lt;em&gt;correct&lt;/em&gt; probability to assign to a proposition given your partial knowledge of the world.&lt;/li&gt;
&lt;li&gt;If there is no uncertainty, there is no probability. &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And any position that holds these propositions is a non-realist-subjective view of probability.&amp;#xA0;&lt;/p&gt;
&lt;p&gt;&amp;#xA0;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&amp;#xA0;&lt;/p&gt;
&lt;p&gt;Imagine a pre-shuffled deck of playing cards and two agents (they don't have to be humans), named &quot;Johnny&quot; and &quot;Sally&quot;, which are betting 1 dollar each on the suit of the top card. As everyone knows, 1/4 of the cards in a playing card deck are hearts. We will name this belief F&lt;sub&gt;1&lt;/sub&gt;; F&lt;sub&gt;1&lt;/sub&gt; stands for &quot;1/4 of the cards in the deck are hearts.&quot;. Johnny and Sally both believe F&lt;sub&gt;1&lt;/sub&gt;. F&lt;sub&gt;1&lt;/sub&gt; is all that Johnny knows about the deck of cards, but sally knows a little bit more about this deck. Sally also knows that 8 of the top 10 cards are hearts. Let F&lt;sub&gt;2&lt;/sub&gt; stand for &quot;8 out of the 10 top cards are hearts.&quot;. Sally believes F&lt;sub&gt;2&lt;/sub&gt;. John doesn't know whether or not F&lt;sub&gt;2&lt;/sub&gt;. F&lt;sub&gt;1&lt;/sub&gt; and F&lt;sub&gt;2&lt;/sub&gt; are beliefs about the deck of cards and they are either true or false.&lt;/p&gt;
&lt;p&gt;So, sally bets that the top card is a heart and Johnny bets against her, i.e., she puts her money on &quot;Top card is a heart.&quot; being true; he puts his money on &quot;~The top card is a heart.&quot; being true. After they make their bets, one could imagine Johnny making fun of Sally; he might say something like: &quot;Are you nuts? You know, I have a 75% chance of winning. 1/4 of the cards are hearts; you can't argue with that!&quot; Sally might reply: &quot;Don't forget that the probability you assign to '~The top card is a heart.' depends on what you know about the deck. I think you would agree with me that there is an 80% chance that 'The top card is a heart' if you knew just a bit more about the state of the deck.&quot;&lt;/p&gt;
&lt;p&gt;To be undecided about a proposition is to not know which &lt;em&gt;possible&lt;/em&gt; world you are in; am I in the possible world where that proposition is true, or in the one where it is false? Both Johnny and Sally are undecided about &quot;The top card is a heart.&quot;; their model of the world &lt;em&gt;splits&lt;/em&gt; at that point of representation. Their knowledge is consistent with being in a possible world where the top card is a heart, or in a possible world where the top card is not a heart. The more statements they decide on, the smaller the configuration space of possible worlds they think they might find themselves in; deciding on a proposition takes a chunk off of that configuration space, and the content of that proposition determines the shape of the eliminated chunk; Sally's and Johnny's beliefs constrain their respective expected experiences, but not all the way to a point. The trick when constraining one's space of &lt;em&gt;viable&lt;/em&gt; worlds, is to make sure that &lt;em&gt;the real world &lt;/em&gt;is among the possible&lt;em&gt; &lt;/em&gt;worlds that satisfy your beliefs. Sally still has the upper hand, because her space of viably possible worlds is smaller than Johnny's. There are many more ways you could arrange a standard deck of playing cards that satisfies F&lt;sub&gt;1 &lt;/sub&gt;than there are ways to arrange a deck of cards that satisfies F&lt;sub&gt;1&lt;/sub&gt; and F&lt;sub&gt;2&lt;/sub&gt;. To be clear, we don't need to believe that possible worlds actually exist to accept this view of belief; we just need to believe that any agent capable of being undecided about a proposition is also capable of imagining alternative ways the world could consistently turn out to be, i.e., capable of imagining possible worlds.&lt;/p&gt;
&lt;p&gt;For convenience, we will say that a possible world W, is viable for an agent A, if and only if, W satisfies A's background knowledge of decided propositions, i.e., A thinks that W might be the world it finds itself in.&lt;/p&gt;
&lt;p&gt;Of the &lt;em&gt;possible &lt;/em&gt;worlds that satisfy F&lt;sub&gt;1&lt;/sub&gt;, i.e., of the possible worlds where &quot;1/4 of the cards are hearts&quot; is true, 3/4 of them also satisfy &quot;~The top card is a heart.&quot; Since Johnny holds that F&lt;sub&gt;1&lt;/sub&gt;, and since he has no further information that might put stronger restrictions on his space of viable worlds, he ascribes a 75% probability to &quot;~The top card is a heart.&quot; Sally, however, holds that F&lt;sub&gt;2&lt;/sub&gt; as well as F&lt;sub&gt;1&lt;/sub&gt;. She knows that of the possible worlds that satisfy F&lt;sub&gt;1&lt;/sub&gt; only 1/4 of them satisfy &quot;The top card is a heart.&quot; But she holds a proposition that constrains her space of viably possible worlds even further, namely F&lt;sub&gt;2&lt;/sub&gt;. Most of the possible worlds that satisfy F&lt;sub&gt;1&lt;/sub&gt; are eliminated as viable worlds if we hold that F&lt;sub&gt;2&lt;/sub&gt; as well, because most of the possible worlds that satisfy F&lt;sub&gt;1&lt;/sub&gt; don't satisfy F&lt;sub&gt;2&lt;/sub&gt;. Of the possible worlds that satisfy F&lt;sub&gt;2&lt;/sub&gt; exactly 80% of them satisfy &quot;The top card is a heart.&quot; So, duh, Sally assigns an 80% probability to &quot;The top card is a heart.&quot; They give that proposition different probabilities, and they are both right in assigning their respective probabilities; they don't disagree about how to assign probabilities, they just have different resources for doing so in this case. P(~The top card is a heart|F&lt;sub&gt;1)&lt;/sub&gt; really is 75% and P(The top card is a heart|F&lt;sub&gt;2&lt;/sub&gt;) really is 80%.&lt;/p&gt;
&lt;p&gt;This setup makes it clear (to me at least) that the right probability to assign to a proposition depends on what you know. The more you know, i.e., the more you constrain the space of worlds you think you might be in, the more useful the probability you assign. The probability that an agent should ascribe to a proposition is directly related to that agent's knowledge of the world.&lt;/p&gt;
&lt;p&gt;This setup also makes it easy to see how an agent can be wrong about the probability it assigns to a proposition given its background knowledge. Imagine a third agent, named &quot;Billy&quot;, that has the same information as Sally, but say's that there's a 99% chance of &quot;The top card is a heart.&quot; Billy doesn't have any information that further constrains the possible worlds he thinks he might find himself in; he's just wrong about the fraction of possible worlds that satisfy F&lt;sub&gt;2&lt;/sub&gt; that also satisfy &quot;The top card is a heart.&quot;. Of all the possible worlds that satisfy F&lt;sub&gt;2&lt;/sub&gt; exactly 80% of them satisfy &quot;The top card is a heart.&quot;, no more, no less. There is only one &lt;em&gt;correct&lt;/em&gt; probability to assign to a proposition given your partial knowledge.&lt;/p&gt;
&lt;p&gt;The last benefit of this way of talking I'll mention is that it makes probability's dependence on ignorance clear. We can imagine another agent that knows the truth value of every proposition, lets call him &quot;FSM&quot;. There is only one possible world that satisfies all of FSM's background knowledge; the only viable world for FSM is &lt;em&gt;the real&lt;/em&gt; world. Of the possible worlds that satisfy FSM's background knowledge, either all of them satisfy &quot;The top card is a heart.&quot; or none of them do, since there is only one viable world for FSM. So the only probabilities FSM can assign to &quot;The top card is a heart.&quot; are 1 or 0. In fact, those are the only probabilities FSM can assign to any proposition. If there is no uncertainty, there is no probability.&lt;/p&gt;
&lt;p&gt;The &lt;em&gt;world knows &lt;/em&gt;whether or not any given proposition is true (assuming determinism). The world itself is never uncertain, only the parts of the world that we call agents can be uncertain. Hence, Probability is always in a mind, not in the world. The probabilities that the &lt;em&gt;universe assigns&lt;/em&gt; &lt;em&gt;to a proposition&lt;/em&gt; are always 1 or 0, for the same reasons FSM only assigns a 1 or 0, and 1 and 0 &lt;em&gt;aren't&lt;/em&gt; &lt;em&gt;really probabilities. &lt;br&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In conclusion, I'll risk the hypothesis that: Where 0&amp;#x2264;x&amp;#x2264;1, &quot;P(a|b)=x&quot; is true, if and only if, of the possible worlds that satisfy &quot;b&quot;, x of them also satisfy &quot;a&quot;. Probabilities are propositional attitudes, and the probability value (or range of values) you assign to a proposition is representative of the fraction of possible worlds you find viable that satisfy that proposition. You may be wrong about the value of that fraction, and as a result you may be wrong about the probability you assign.&lt;/p&gt;
&lt;p&gt;We may call the position summarized by the hypothesis above &quot;Modal Satisfaction Frequency theory&quot;, or &quot;MSF theory&quot;.&lt;/p&gt;&lt;/div&gt;
&lt;a href="http://lesswrong.com/lw/6w8/msf_theory_another_explanation_of_subjectively/#comments"&gt;11 comments&lt;/a&gt;
</description>
</item>
</channel>
</rss>