This is interesting, and I'd like to understand exactly how the updating goes at each step. I'm not totally sure myself, which is why I'm asking the question about what your approach implies.
Remember Beauty now has to update on two things: the bias of the coin (the fraction p of times it would fall Tails in many throws) and whether it actually fell Tails in the particular throw. So she has to maintain a subjective distribution over the pair of parameters (p, Heads|Tails).
Step 1: Assuming an "ignorant" prior (no information about p except that is between 0 and 1) she has a distribution P[p = r & Tails] = r, P[p = r & Heads] = 1 - r for all values of r between 0 and 1. This gives P[Tails] = 1/2 by integration.
Step 2: On awakening, does she update her distribution of p, or of the probability of Tails given that p=r? Or does she do both?
It seems paradoxical that the mere fact of waking up would cause her to update either of these. But she has to update something to allow her to now set P[Tails] = 2/3. I'm not sure exactly how she should do it, so your views on that would be helpful.
One approach is to use relative frequency again. Assume the experiment is now run multiple times, but with different coins each time, and the coins are chosen from a huge pile of coins having all biases between zero and one in "equal numbers". (I'm not sure this makes sense, partly because p is a continuous variable, and we'll need to approximate it by a discrete variable to get the pile to have equal numbers; but mainly because the whole approach seems contrived. However, I will close my eyes and calculate!)
The fraction of awakenings after throwing a coin with bias p becomes proportional to 1 + p. So after normalization, the distribution of p on awakening should shift to (2/3)(1 + p). Then, given that a coin with bias p is thrown, the fraction of awakenings after Tails is 2p / (1 + p), so the joint distribution after awakening is P[p = r & Tails] = (4/3)r, and P[p = r & Heads] = (2/3)(1 - r), which when integrating again gives P[Tails] = 2/3.
Step 3: When Beauty learns it is Monday what happens then? Well her evidence (call it "E") is that"I have been told that it is Monday today" (or "This awakening of Beauty is on Monday" if you want to ignore the possible complication of untruthful reports). Notice the indexical terms.
Continuing with the relative frequency approach (shut up and calculate again!) Beauty should set P[E|p = r] = 1/(1+r) since if a coin with bias r is thrown repeatedly, that becomes the fraction of all Beauty awakenings which will learn that "today is Monday". So the evidence E should indeed shift Beauty's distribution on p towards lower values of p (since they assign higher probability to the evidence E). However, all the shift is doing here is to reverse the previous upward shift at Step 2.
More formally, we have P[E & p = r] proportional to 1/(1 + r) x (1 + r) and the factors cancel out, so that p[E & p = r] is a constant in r. Hence P[p = r | E] is also a constant in r, and we are back to the uniform distribution over p. Filling in the distribution in the other variable, we get P[Tails | E & p = r] = r. Again look at relative frequencies: if a coin with bias r is thrown repeatedly, then among the Monday-woken Beauties, a fraction r of them will be woken after Tails. So we are back to the original joint distribution P[p = r & Tails] = r, P[p = r & Heads] = 1 - r, and again P[Tails] = 1/2 by integration.
After all that work, the effect of Step 2 is very like applying an SIA shift (Bias to Tails is deemed more likely, because that results in more Beautiful experiences) and the effect of Step 3 is then like applying an SSA shift (Heads-bias is more likely, because that makes it more probable that a randomly-selected Beautiful experience is a Monday-experience). The results cancel out. Churning through the trillion-Beauty case will give the same effect, but with bigger shifts in each direction; however they still cancel out.
The application to the Doomsday Argument is that (as is usual given the application of SIA and SSA together) there is no net shift towards "Doom" (low probability of expanding, colonizing the Galaxy with a trillion trillion people and so on). This is how I think it should go.
However, as I noted in my previous comments, there is still a "Presumptuous Philosopher" effect when Beauty wakes up, and it is really hard to justify this if the relative frequencies of different coin weights don't actually exist. You could consider for instance that Beauty has different physical theories about p: one of those theories implies that p = 1/2 while another implies that p = 9/10. (This sounds pretty implausible if a coin, but if the coin-flip is replaced by some poorly-understood randomization source like a decaying Higgs Boson, then this seems more plausible). Also, for the sake of argument, both theories imply infinite multiverses, so that there are just as many Beautiful awakenings - infinitely many - in each case.
How can Beauty justify believing the second theory more, simpy because she has just woken up, when she didn't believe it before going to sleep? That does sound really Presumptuous!
A final point is that SIA tends to cause problems when there is a possibility of an infinite multiverse, and - as I've posted elsewhere - it doesn't actually counter SSA in those cases, so we are still left with the Doomsday Argument. It's a bit like refusing to shift towards "Tails" at Step 2 (there will be infinitely many Beauty awakenings for any value of p, so why shift? SIA doesn't tell us to), but then shifting to "Heads" after Step 3 (if there is a coin bias towards Heads then most of the Beauty-awakenings are on Monday, so SSA cares, and let's shift). In the trillion-Beauty case, there's a very big "Heads" shift but without the compensating "Tails" shift.
If your approach can recover the sorts of shift that happen under SIA+SSA, but without postulating either, that is a bonus, since it means we don't have to worry about how to apply SIA in the infinite case.
So what does Bayes' theorem tell us about the Sleeping Beauty case?
It says that P(B|AC) = P(B|C) * P(A|BC)/P(A|C). In this case C is sleeping beauty's information before she wakes up, which is there for all the probabilities of course. A is the "anthropic information" of waking up and learning that what used to be "AND" things are now mutually exclusive things. B is the coin landing tails.
Bayes' theorem actually appears to break down here, if we use the simple interpretation of P(A) as "the probability she wakes up." Becaus...
Introduction
An anthropic problem is one where the very fact of your existence tells you something. "I woke up this morning, therefore the earth did not get eaten by Galactus while I slumbered." Applying your existence to certainties like that is simple - if an event would have stopped you from existing, your existence tells you that that it hasn't happened. If something would only kill you 99% of the time, though, you have to use probability instead of deductive logic. Usually, it's pretty clear what to do. You simply apply Bayes' rule: the probability of the world getting eaten by Galactus last night is equal to the prior probability of Galactus-consumption, times the probability of me waking up given that the world got eaten by Galactus, divided by the probability that I wake up at all. More exotic situations also show up under the umbrella of "anthropics," such as getting duplicated or forgetting which person you are. Even if you've been duplicated, you can still assign probabilities. If there are a hundred copies of you in a hundred-room hotel and you don't know which one you are, don't bet too much that you're in room number 68.
But this last sort of problem is harder, since it's not just a straightforward application of Bayes' rule. You have to determine the probability just from the information in the problem. Thinking in terms of information and symmetries is a useful problem-solving tool for getting probabilities in anthropic problems, which are simple enough to use it and confusing enough to need it. So first we'll cover what I mean by thinking in terms of information, and then we'll use this to solve a confusing-type anthropic problem.
Parable of the coin
Eliezer has already written about what probability is in Probability is in the Mind. I will revisit it anyhow, using a similar example from Probability Theory: The Logic of Science.
It is a truth universally acknowledged that when someone tosses a fair coin without cheating, there's a 0.5 probability of heads and a 0.5 probability of tails. You draw the coin forth, flip it, and slap it down. What is the probability that when you take your hand away, you see heads?
Well, you performed a fair coin flip, so the chance of heads is 0.5. What's the problem? Well, imagine the coin's perspective. When you say "heads, 0.5," that doesn't mean the coin has half of heads up and half of tails up: the coin is already how it's going to be, sitting pressed under your hand. And it's already how it is with probability 1, not 0.5. If the coin is already tails, how can you be correct when you say that it's heads with probability 0.5? If something is already determined, how can it still have the property of randomness?
The key idea is that the randomness isn't in the coin, it's in your map of the coin. The coin can be tails all it dang likes, but if you don't know that, you shouldn't be expected to take it into account. The probability isn't a physical property of the coin, nor is it a property of flipping the coin - after all, your probability was still 0.5 when the truth was sitting right there under your hand. The probability is determined by the information you have about flipping the coin.
Assigning probabilities to things tells you about the map, not the territory. It's like a machine that eats information and spits out probabilities, with those probabilities uniquely determined by the information that went in. Thinking about problems in terms of information, then, is about treating probabilities as the best possible answers for people with incomplete information. Probability isn't in the coin, so don't even bother thinking about the coin too much - think about the person and what they know.
When trying to get probabilities from information, you're going to end up using symmetry a lot. Because information uniquely specifies probability, if you have identical information about two things, then you should assign them equal probability. For example, if someone switched the labels "heads" and "tails" in a fair coin flip, you couldn't tell that it had been done - you never had any different information about heads as opposed to tails. This symmetry means you should give heads and tails equal probability. Because heads and tails are mutually exclusive (they don't overlap) and exhaustive (there can't be anything else), the probabilities have to add to 1 (which is all the probability there is), so you give each of them probability 0.5.
Brief note on useless information
Real-world problems, even when they have symmetry, often start you off with a lot more information than "it could be heads or tails." If we're flipping a real-world coin there's the temperature to consider, and the humidity, and the time of day, and the flipper's gender, and that sort of thing. If you're an ordinary human, you are allowed to call this stuff extraneous junk. Sometimes, this extra information could theoretically be correlated with the outcome - maybe the humidity really matters somehow, or the time of day. But if you don't know how it's correlated, you have at least a de facto symmetry. Throwing away useless information is a key step in doing anything useful.
Sleeping Beauty
So thinking with information means assigning probabilities based on what people know, rather than treating probabilities as properties of objects. To actually apply this, we'll use as our example the sleeping beauty problem:
If the coin lands heads, Sleeping Beauty is only asked for her guess once, while if the coin lands tails she is asked for her guess twice, but her memory is erased in between so she has the same memories each time.
When trying to answer for Sleeping Beauty, many people reason as follows: It is a truth universally acknowledged that when someone tosses a fair coin without cheating, there's a 0.5 probability of heads and a 0.5 probability of tails. So since the probability of tails is 0.5, Beauty should say "0.5," Q.E.D. Readers may notice that this argument is all about the coin, not about what Beauty knows. This violation of good practice may help explain why it is dead wrong.
Thinking with information: some warmups
To collect the ingredients of the solution, I'm going to first go through some similar-looking problems.
In the Sleeping Beauty problem, she has to choose between three options - let's call them {H, Monday}, {T, Monday}, and {T, Tuesday}. So let's start with a very simple problem involving three options: the three-sided die. Just like for the fair coin, you know that the sides of the die are mutually exclusive and exhaustive, and you don't know anything else that would be correlated with one side showing up more than another. Sure, the sides have different labels, but the labels are extraneous junk as far as probability is concerned. Mutually exclusive and exhaustive means the probabilities have to add up to one, and the symmetry of your information about the sides means you should give them the same probabilities, so they each get probability 1/3.
Next, what should Sleeping Beauty believe before the experiment begins? Beforehand, her information looks like this: she signed up for this experiment where you get woken up on Monday if the coin lands heads and on Monday and Tuesday if it lands tails.
One good way to think of this last piece of information is as a special "AND" structure containing {T, Monday} and {T, Tuesday}, like in the picture to the right. What it means is that since the things that are "AND" happen together, the other probabilities won't change if we merge them into a single option, which I shall call {T, Both}. Now we have two options, {H, Monday} and {T, Both}, which are both exhaustive and mutually exclusive. This looks an awful lot like the fair coin, with probabilities of 0.5.
But can we leave it at that? Why shouldn't two days be worth twice as much probability as one day, for instance? Well, it turns out we can leave at that, because we have now run out of information from the original problem. We used that there were three options, we used that they were exhaustive, we used that two of them always happened together, and we used that the remaining two were mutually exclusive. That's all, and so that's where we should leave it - any more and we'd be making up information not in the problem, which is bad.
So to decompress, before the experiment begins Beauty assigns probability 0.5 to the coin landing heads and being woken up on Monday, probability 0.5 to the coin landing tails and being woken up on Monday, and probability 0.5 to the coin landing tails and being woken up on Tuesday. This adds up to 1.5, but that's okay since these things aren't all mutually exclusive.
This new problem looks sort of familiar. You have three options, {H, H}, {T, H} and {T, T}, and these options are mutually exclusive and exhaustive. So does that mean it's the same set of information as the three-sided die? Not quite. Similar to the "AND" previously, my drawing for this problem has an "OR" between {T, H} and {T,T}, representing additional information.
I'd like to add a note here about my jargon. "AND" makes total sense. One thing happens and another thing happens. "OR," however, doesn't make so much sense, because things that are mutually exclusive are already "or" by default - one thing happens or another thing happens. What it really means is that {H, H} has a symmetry with the sum of {T, H} and {T, T} (that is, {T, H} "OR" {T, T}). The "OR" can also be thought of as information about {H, H} instead - it contains what could have been both the {H, H} and {H, T} events, so there's a four-way symmetry in the problem, it's just been relabeled.
When we had the "AND" structure, we merged the two options together to get {tails, both}. For "OR," we can do a slightly different operation and replace {T, H} "OR" {T, T} by their sum, {T, either}. Now the options become {H, H} and {T, either}, which are mutually exclusive and exhaustive, which gets us back to the fair coin. Then, because {T, H} and {T, T} have a symmetry between them, you split the probability from {T, either} evenly to get probabilities of 0.5, 0.25, and 0.25.
Okay, for real now
Okay, so now what do things look like once the experiment has started? In English, now she knows that she signed up for this experiment where you get woken up on Monday if the coin lands heads and on Monday and Tuesday if it lands tails, went to sleep, and now she's been woken up.
This might not seem that different from before, but the "anthropic information" that Beauty is currently one of the people in the experiment changes the formal picture a lot. Before, the three options were not mutually exclusive, because she was thinking about the future. But now {H, Monday}, {T, Monday}, and {T, Tuesday} are both exhaustive and mutually exclusive, because only one can be the case in the present. From the coin flip, she still knows that anything with heads is mutually exclusive with anything with tails. But once two things are mutually exclusive you can't make them any more mutually exclusive.
But the "AND" information! What happens to that? Well, that was based on things always happening together, and we just got information that those things are mutually exclusive, so there's no more "AND." It's possible to slip up here and reason that since there used to be some structure there, and now they're mutually exclusive, it's one or the other, therefore there must be "OR" information. At least the confusion in my terminology reflects an easy confusion to have, but this "OR" relationship isn't the same as mutual exclusivity. It's a specific piece of information that wasn't in the problem before the experiment, and wasn't part of the anthropic information (that was just mutual exclusivity). So Monday and Tuesday are "or" (mutually exclusive), but not "OR" (can be added up to use another symmetry).
And so this anthropic requirement of mutual exclusivity turns out to make redundant or render null a big chunk of the previous information, which is strange. You end up left with three mutually exclusive, exhaustive options, with no particular asymmetry. This is the three-sided die information, and so each of {H, Monday}, {T, Monday}, and {T, Tuesday} should get probability 1/3. So when asked for P(tails), Beauty should answer 2/3.
"SSA" and "SIA"
When assigning prior probabilities in anthropic problems, there are two main "easy" ways to assign probabilities, and these methods go by the acronyms "SSA" and "SIA." "SSA" is stated like this1:
All other things equal, an observer should reason as if they are randomly selected from the set of all actually existent observers (past, present and future) in their reference class.
For example, if you wanted the prior probability that you lived in Sweden, you might say ask "what proportion of human beings have lived in Sweden?"
On the other hand, "SIA" looks like this2:
All other things equal, an observer should reason as if they are randomly selected from the set of all possible observers.
Now the question becomes "what proportion of possible observers live in Sweden?" and suddenly it seems awfully improbable that anyone could live in Sweden.
The astute reader will notice that these two "assumptions" correspond to two different sets of starting information. If you want a quick exercise, figure out what those two sets of information are now. I'll wait for you in the next paragraph.
Hi again. The information assumed for SSA is pretty straightforward. You are supposed to reason as if you know that you're an actually existent observer, in some "reference class." So an example set of information would be "I exist/existed/will exist and am a human." Compared to that, SIA seems to barely assume any information at all - all you get to start with is "I am a possible observer." Because "existent observers in a reference class" are a subset of possible observers, you can transform SIA into SSA by adding on more information, e.g. "I exist and am a human." And then if you want to represent a more complicated problem, you have to add extra information on top of that, like "I live in 2012" or "I have two X chromosomes."
Trouble only sneaks in if you start to see these acronyms as mysterious probability generators rather than sets of starting information to build on. So don't do that.
Closing remarks
When faced with straightforward problems, you usually don't need to use this knowledge of where probability comes from. It's just rigorous and interesting, like knowing how to do integration as a Riemann sum. But whenever you run into foundational or even particularly confusing problems, it's good to remember that probability is about making the best use you can of incomplete information. If not, you run the risk of a few silly failure modes, or even (gasp) frequentism.
I recently read an academic paper3 that used the idea that in a multiverse, there will be some universe where a thrown coin comes up heads every time, and so the people in that universe will have very strange ideas about how coins work. Therefore, this actual academic paper argued, since reasoning with probability can lead people to be wrong, it cannot be applied to anything like a multiverse.
My response is: what have you got that works better? In this post we worked through assigning probabilities by using all of our information. If you deviate from that, you're either throwing information away or making it up. Incomplete information lets you down sometimes, that's why it's called incomplete. But that doesn't license you to throw away information or make it up, out of some sort of dissatisfaction with reality. The truth is out there. But the probabilities are in here.