Thanks for presenting my take on Sleeping Beauty. Your generalization beyond my assumption that Beauty's observations on Monday/Tuesday are independent and low-probability are interesting.
I'm not as dismissive as you are of betting arguments. You're right, of course, that a betting argument for something having some probability could be disputed by someone who doesn't accept your ideas of decision theory. But since typically lots of people will agree with with your ideas of decision theory, it may be persuasive to some.
Now, I have to...
rising ways.
Here, you dropped this from the last bullet point at the end :)
A very clear walkthrough of full nonindexical conditioning. Thanks! I think there's still a big glaring warning sign that this could be wrong, which is the mismatch with frequency (and, by extension, betting). Probability is logically prior to frequency estimation, but that doesn't mean I think they're decoupled. If your "probability" has zero application because your decision theory uses "likeliness weights" calculated an entirely different way, I...
I wrote up a response to this, but I thought it was also worthwhile writing a comment that directly responds to the argument about whether we can update on a random bit of information.
@travisrm89 wrote:
How can receiving a random bit cause Beauty to update her probability, as in the case where Beauty is an AI? If Beauty already knows that she will update her probability no matter what bit she receives, then shouldn't she already update her probability before receiving the bit?
Ksvanhorn responds by pointing out that this assumes that the probabilities a...
Insofar as I understand, you endorse betting on 1:2 odds regardless of whether you believe the probability is 1/3 or 1/2 (i.e., regardless of whether you have received lots of random information) because of functional decision theory.
But in the case where you receive lots of random information you assign 1/3 probability to the coin ending up heads. If you then use FDT it looks like there is 2/3 probability that you will do the bet twice with the outcome tails; and 1/3 probability that you will do the bet once with the outcome heads. Therefore, you should b...
Why do you say that there is no "now", "today", or "here" in classical logic? Classical logic is just a system of logic based on terms and predicates. There is no reason that "now", "today", and "here" can't be terms in the logic. Now presumably you meant to say that such words cause a statement to have different meanings depending on who speaks it. But why is this a problem?
This is a very enlightening post. But something doesn't seem right. How can receiving a random bit cause Beauty to update her probability, as in the case where Beauty is an AI? If Beauty already knows that she will update her probability no matter what bit she receives, then shouldn't she already update her probability before receiving the bit?
You mis-characterize what Elga does. He never directly formulates the state M1, where Beauty is awake. Instead, he formulates two states that are derived from information being added to M1. I'll call them M2A (Beauty learns the outcome is Tails) and M2B (Beauty learns that it is Monday). While he may not do it as formally as you want, he works backwards to show that three of the four components of a proper description of state M1 must have the same probability. What he skips over, is identifying the fourth component (whose probability is now zero).
Wha...
Interesting, but I disagree. I fully agree that the problem is ambiguous in that it doesn't define what the actual proposition is. I think different assumptions can lead to saying 1/3 or 1/2, but with deconstruction can be shown to always be 1/2. I don't think anything in between is reasonable, and I don't think any information is gained by waking up (which has a prior of 1.0, so no surprise value).
Probability is in the map, not the territory. It matters a lot what is actually being predicted, which is what the "betting" approa...
It seems really odd to do the latter, and I think more motivation is needed for it.
This old post of mine may help. The short version is that if you do probability with "centered propositions" then the resulting probabilities can't be used in expected utility maximization.
(To be fair, I don’t have a better alternative in mind.)
I think the logical next step from Neal’s concept of “full non-indexical conditioning” (where updating on one's experiences means taking all possible worlds, assigning 0 probability to those not containing "a version of me which has received this data as well as all of the prior data I have received", then renormalizing sum of the rest to 1) is to not update, in other words, use UDT. The motivation here is that from a decision making perspective, the assigning 0 / renormalizing step either does nothing (if your decision has no consequences in the worlds that you'd assign 0 probability to) or is actively bad (if your decision does have consequences in those possible worlds, due to logical correlation between you and something/someone in one of those worlds). (UDT also has a bunch of other motivations if this one seems insufficient by ...
This paper starts out with a misrepresentation. "As a reminder, this is the Sleeping Beauty problem:"... and then it proceeds to describe the problem as Adam Elga modified it to enable his thirder solution. The actual problem that Elga presented was:
...Some researchers are going to put you to sleep. During the two days[1] that your sleep will last, they will briefly wake you up either once or twice, depending on the toss of a fair coin (Heads: once; Tails: twice). After each waking, they will put you to back to sleep with a drug that makes you forget that wak
I think this post is fairly wrong headed.
First, your math seems to be wrong.
Your numerator is ½ * p(y), which seems like a Pr (H | M) * Pr(X2 |H, M)
Your denominator is 1/2⋅p(y)+1/2⋅p(y)(2−q(y)), which seems like
Pr(H∣M) * Pr(X2∣H,M) + Pr(¬H∣M) * Pr(X2∣¬H,M), which is Pr(X2 |M)
By bayes rule, Pr (H | M) * Pr(X2 |H, M) / Pr(X2 |M) = Pr(H∣X2, M), which is not the same quantity you claimed to compute Pr(H∣X2). Unless you have some sort of other derivation or a good reason why you omitted M in your calculations: this isn’t really “solving” anything.
Second,...
You point out that Elga's analysis is based on an unproven assertion; that "it is Monday” and “it is Tuesday” are legitimate propositions. As far as I know, there is no definition of what can, or cannot, be used as a proposition. In other words, your analysis is based on the equally unproven assertion that they are not valid. Can remove the need to decide?
This paper starts out with a misrepresentation. "As a reminder, this is the Sleeping Beauty problem:"... and then it proceeds to describe the problem as Adam Elga modified it to enable his thirder solution. The actual problem that Elga presented was:
...Some researchers are going to put you to sleep. During the two days[1] that your sleep will last, they will briefly wake you up either once or twice, depending on the toss of a fair coin (Heads: once; Tails: twice). After each waking, they will put you to back to sleep with a drug that makes you forget that wak
As it stands now, I can't accept this solution, simply because it doesn't inform the right decision.
Imagine you were Beauty and q(y) was 1, and you were offered that bet. What odds would you take?
Our models exist to serve our actions. There is no such thing as a good model that informs the wrong action. Probability must add up to winning.
Or am I interpreting this wrong, and is there some practical reason why taking 1/2 odds actually does win in the q(y) = 1 case?
Beauty's physiological state (heart rate, blood glucose level, etc.) will not be identical, and will affect her thoughts at least slightly. Treating these and other differences as random,
Not all of the differences are random, though. Sleeping Beauty will always have aged by one day if awakened on Monday, and by two days if awakened on Tuesday, and even that much aging has distinguishable consequences. Now, I'm not at all familiar with the math involved, but it seems like this solution hinges on "everything" being random. If not everything is random, does this solution still work?
[This is Part 1. See also Part 2.]
Introduction
The Sleeping Beauty problem has been debated ad nauseum since Elga's original paper [Elga2000], yet no consensus has emerged on its solution. I believe this confusion is due to the following errors of analysis:
The only analysis I have found that avoids all of these errors is in Radford Neal's underappreciated technical report on anthropic reasoning [Neal2007]. In this note I'll discuss how both “thirder” and “halfer” arguments exhibit one or more of the above errors, how Neal's analysis avoids them, and how the conclusions change when we alter the scenario in various ways.
As a reminder, this is the Sleeping Beauty problem:
The question is this: when awakened during the experiment, what probability should Beauty give that the coin in step 3 lands Heads?
“Halfers” argue that the answer is 1/2, and “thirders” argue that the answer is 1/3. I will argue that any answer between 1/2 and 1/3 may be obtained, depending on details not specified in the problem description; but under reasonable assumptions the answer is slightly more than 1/3. Furthermore,
The standard framework for solving probability problems
There are actually three separate “Heads” probabilities that arise in this problem:
There is agreement that p1=p3=1/2, but disagreement as to whether p2=1/2 or p2=1/3. What does probability theory tell us about how to approach this problem? The pi are all epistemic probabilities, and they are all probabilities for the same proposition—coin lands “Heads”—so any difference can only be due to different information possessed by Beauty in the three cases. The proper procedure for answering the question is then the following:
Since Beauty does not forget anything she knows on Sunday, we can take M to express everything she knows on Sunday, and X1 to be null (no additional information).
Failure to properly apply probability theory
With the exception of Neal, thirders do not follow the above process. Instead they posit one model M1 for the first and third cases, which they then toss out in favor of an entirely new model M2 for the second case. This is a fundamental error.
To be specific, M1 is something like this:
where WM means that Beauty wakes on Monday, WT means that Beauty wakes on Tuesday, and Bernoulli(p) is the distribution on {false,true} that assigns probability p to true. X3 would be Beauty's experiences and observations from the last time she awakened, and this is implicitly assumed to be irrelevant to whether H is true, so that
Thirders then usually end up positing an M2 that is equivalent to the following:
The first line above means that HM, TM, and TT are mutually exclusive, each having probability 1/3.
M2 is not derived from M1 via conditioning on any new information X2; instead thirders construct an argument for it de novo. For example, Elga's original paper [Elga2000] posits that, if the coin lands Tails, Beauty is told it is Monday just before she is put to sleep again, and declares by fiat that her probability for H at this point should be 1/2; he then argues backwards from there as to what her probability for H had to have been prior to being told it is Monday.
A red herring: betting arguments
Some thirders also employ betting arguments. Suppose that on each Monday/Tuesday awakening Beauty is offered a bet in which she wins $2 if the coin lands Tails and loses $3 if it lands Heads. Her expected gain is positive ($0.50) if she accepts the bets, since she has two awakenings if the coin lands Tails, yielding $4 in total, but will have only one awakening and lose only $3 if it lands Heads. Therefore she should accept the bet; but if she uses a probability of 1/2 for Heads on each awakening she computes a negative expected gain (-$0.50) and will reject the bet.
One can argue that Beauty is using the wrong decision procedure in the above argument, but there is a more fundamental point to be made: probability theory is logically prior to decision theory. That is, probability theory can be developed, discussed, and justified [Cox1946, VanHorn2003, VanHorn2017] entirely without reference to decision theory, but the concepts of decision theory rely on probability theory. If our probabilistic model yields p as Beauty's probability of Heads, and plugging this probability into our decision theory yields suboptimal results evaluated against that same model, then this a problem with the decision theory; perhaps a more comprehensive theory is required [Yudkowsky&Soares2017].
Failure to construct legitimate propositions for analysis
Another serious error in many discussions of this problem is the use of supposedly mutually exclusive “propositions” that are neither mutually exclusive nor actually legitimate propositions. HM, TM, and TT can be written as
These are not truly mutually exclusive because, if not H, then Beauty will awaken on both Monday and Tuesday. Furthermore, the supposed propositions “it is Monday” and “it is Tuesday” are not even legitimate propositions. Epistemic probability theory is an extension of classical propositional logic [Cox1946, VanHorn2003, VanHorn2017], and applies only to entities that are legitimate propositions under the classical propositional logic—but there is no “now,” “today,” or “here” in classical logic.
Both Elga's paper and much of the other literature on the Sleeping Beauty problem discuss the idea of “centered possible worlds,” each of which is equipped with a designated individual and time, and corresponding “centered propositions”—which are not propositions of classical logic. To properly reason about “centered propositions” one would need to either translate them into propositions of classical logic, or develop an alternative logic of centered propositions; yet none of these authors propose such an alternative logic. Even were they to propose such an alternative logic, they would then need to re-derive an alternative probability theory that is the appropriate extension of the alternative propositional logic.
However, it is doubtful that this alternative logic is necessary or desirable. Time and location are important concepts in physics, but physics uses standard mathematics based on classical logic that has no notion of “now”. Instead, formulas are explicitly parameterized by time and location. Before we go off inventing new logics, perhaps we should see if standard logic will do the job, as it has done for all of science to date.
Failure to include all relevant information
Lewis [Lewis2001] argues for a probability of p2=1/2, since Sunday's probability of Heads is p1=1/2, and upon awakening on Monday or Tuesday Beauty has no additional information—she knew that she would experience such an awakening regardless of how the coin lands. That is, Lewis bases his analysis on M1, and assumes that X2 contains no information relevant to the question:
Lewis's logic is correct, but his assumption that X2 contains no information of relevance is wrong. Surprisingly, Beauty's stream of experiences after awakening is relevant information, even given that her prior distribution for what she may experience upon awakening has no dependence on the day or the coin toss. Neal discusses this point in his analysis of the Sleeping Beauty problem. He introduces the concept of “full non-indexical conditioning,” which roughly means that we condition on everything, even stuff that seems irrelevant, because often our intuition is not that good at identifying what is and is not actually relevant in a probabilistic analysis. Neal writes,
Bayes' Rule, applied with equal prior probabilities for Heads and Tails, then yields a posterior probability for Tails that is twice that of Heads; that is, the posterior probability of Heads is 1/3.
Defining the model
Verbal arguments are always suspect when it comes to probability puzzles, so let's actually do the math. Our first step is to extend M1 to model M as follows:
This is the crucial point. Usually we apply Bayes' Rule by conditioning on the information that some variable has some specific value, or that its value lies in some specific range. It is very unusual to condition on this sort of disjunction (OR) of possibilities, where we know the value but not which specific variable has that value. This novelty may explain why the Sleeping Beauty problem has proven so difficult to analyze.
Analysis
The prior for H is even odds:
The likelihoods are
and
Applying Bayes' Rule we obtain
Now let's consider various possibilities:
In fact, any value for p2 between 1/2 and 1/3 is possible. Let q=3−1/p, for any desired p in the range 1/3≤p≤1/2. If the experimenters set things up so that, with probability q, Beauty's Monday and Tuesday experiences are identical, and with probability 1−q they are different, then p2=p.
Conclusion
There are three important lessons here:
References
[Cox1946] R. T. Cox, 1946. "Probability, frequency, and reasonable expectation," American Journal of Physics 17, pp. 1-13.
[Elga2000] A. Elga, 2000. "Self-locating belief and the Sleeping Beauty problem," Analysis 16, pp. 143-147.
[Lewis2001] D. Lewis, 2001. "Sleeping Beauty: reply to Elga," Analysis 61, pp. 171-176.
[Neal2007] R. M. Neal, 2007. "Puzzles of anthropic reasoning resolved using full non-indexical condition," Technical Report No. 0607, Dept. of Statistics, University of Toronto. Online at https://arxiv.org/abs/math/0608592.
[VanHorn2003] K. S. Van Horn, 2003. "Constructing a logic of plausible inference: a guide to Cox's Theorem," International Journal of Approximate Reasoning 34, no. 1, pp. 3-24. Online at https://www.sciencedirect.com/science/article/pii/S0888613X03000513
[VanHorn2017] K. S. Van Horn, 2017. "From propositional logic to plausible reasoning: a uniqueness theorem," Int'l Journal of Approximate Reasoning 88, pp. 309--332. Online at https://www.sciencedirect.com/science/article/pii/S0888613X16302249, preprint at https://arxiv.org/abs/1706.05261.
[Yudkowsky&Soares2017] E. Yudkowsky and N. Soares, 2017. "Functional decision theory: a new theory of instrumental rationality," https://arxiv.org/abs/1710.05060 .