Imagine you have an indestructible machine that restores one copy of you from backup every 200 years. In this scenario, it seems you should anticipate waking up with equal probability between now and the end of time. But it's inconsistent with the formulation of probability for discrete outcomes: we end up with a diverging series, and as the length of the experiment approaches infinity (ignoring real-world cosmology for the moment), the subjective probability of every individual outcome (finding yourself at t=1, finding yourself at t=2, etc.) approaches 0.
But why should we ignore real-world cosmology, even temporarily, when that completely changes the nature of the problem? The universe probably isn't going to go on literally forever, so there'll be a huge but finite number of chances to wake up. I think it's well-established that using normal probability math on infinite sets can yield paradoxical results if the set isn't constructed strictly as the limit of a finite process; this is just one example of that.
Edit: From Probability Theory: The Logic of Science:
Infinite set paradoxing has become a morbid infection that is today spreading in a way that threatens the very life of probability theory, and requires immediate surgical removal. In our system, after this surgery, such paradoxes are avoided automatically; they cannot arise from correct application of our basic rules, because those rules admit only finite sets and infinite sets that arise as well-defined and well-behaved limits of finite sets. The paradoxing was caused by (1) jumping directly into an infinite set without specifying any limiting process to define its properties; and then (2) asking questions whose answers depend on how the limit was approached.
For example, the question: “What is the probability that an integer is even?” can have any answer we please in (0, 1), depending on what limiting process is to define the “set of all integers” (just as a conditionally convergent series can be made to converge to any number we please, depending on the order in which we arrange the terms).
In our view, an infinite set cannot be said to possess any “existence” and mathematical properties at all—at least, in probability theory—until we have specified the limiting process that is to generate it from a finite set.
I really dislike that quote from Probability Theory. A more rigorous version of his statement is: "You can't ask questions about probabilities unless you have a probability space," or even, "Your intuition about probability sucks, so you shouldn't expect a question to have an answer unless you can mathematically formalize it." You can construct a probability space in any number of ways, of which taking a limit of finite probability spaces is just the oldest and most primitive (and there are questions about limits of finite probability spaces which also cause our intuitions to explode, because not all subsets of (0, 1), say, are necessarily measurable; see for example the gnomes + axiom of choice paradox).
An internet search for the gnomes example turned up the parent comment as the first hit. Any help?
I've seen the problem various places, but I guess its not easily googled. It goes like this:
An infinite collection of gnomes is assembled, say for simplicity one for each integer 0, 1, 2, .... Each gnome is given a hat, whose color they cannot see. Say for simplicity that the colors of the gnomes hats are real numbers in the interval [0, 1) (certainly there is a continuum of possible colors---the problem would work no matter how many or how few colors there were). Each gnome is able to see the colors of every other gnome's hat, but not his own. After looking around, each gnome guesses the color of his own hat.
The gnomes are allowed to agree on a strategy beforehand. The counterintuitive part is that we allow the gnomes to choose any strategy at all, not just one which they could physically implement. So, in particular, they can all agree on an infinite lookup table which has no finite description and use this lookup table.
The question is: can the gnomes choose a strategy such that all but finitely many gnomes guess the color of their hat correctly?
A simple probabilistic argument shows they cannot. Indeed, if there was only one gnome, and his hat color was chosen randomly, then clearly he can't know the color of his hat with probability >0 (he's trying to guess a real number, each of which is correct with probability exactly 0. This is not where the problem is). Clearly, if every gnome is given an independent uniformly random hat color, then each gnome can only guess their own hat color with probability 0. Indeed, suppose for a contradiction that a gnome were able to guess their own hat color with probability >0. Then he could also predict his own hat color with probability >0 even without this auxiliary information---just make up a bunch of fictional gnomes, assign them independent uniformly random hat colors, and use this fictional auxiliary information to make a guess.
Now we can apply linearity of expectation (even if there is some correlation between the accuracy of the gnome's guesses) to conclude that the expected number of gnomes who guess correctly is zero. (This ISN'T where the flaw is. Summing up a countably infinite number of things is perfectly legal.) So in particular, with probability 1 it must be the case that zero gnomes guess correctly. So if there are infinitely many gnomes, then with probability 1 infinitely many of them guess incorrectly. In particular, there can be no way that all but finitely many gnomes are guaranteed to guess correctly!
On the other hand, the axiom of choice implies that the gnomes do have a strategy that guarantees that all but finitely many gnomes guess correctly. Say that two assignments of hats are equivalent if only finitely many gnomes get different hat colors in the two assignments. Note that this is transitive, reflexive, and symmetric, so it divides up the space of all possible hat assignments into equivalence classes. Moreover, after seeing the hat colors of all of the other gnomes, each gnome knows which equivalence class they are in. So before the game, the gnomes agree on one representative from each equivalence class using the axiom of choice. Then in the game, each gnome figures out which equivalence class the true hat assignment is in and guesses that hat color which he received in the representative assignment from that equivalence class. By construction, the truth differs from the representative of the equivalence class of the truth only in finitely many places, so only finitely many gnomes are incorrect.
Which of these two arguments is wrong? The impossibility argument is wrong, just at the point where I inserted "clearly" (this is a common trend in proofs. If you want to find the error, look for "clearly"...) This argument assumed a false dichotomy: either a gnome guesses his hat correctly with probability >0, or a gnome guesses his hat correctly with probability =0. As it turns out, there is no probability that the gnome guesses his hat correctly. Some events just don't have probabilities if you believe the axiom of choice. Its very much analogous to the way that Banach-Tarski contradicts our intuition about volume in Euclidean space---if you believe the axiom of choice, you have to accept that some things just don't have a definite volume.
It seems like you are demonstrating Jaynes point with the very argument you are using to criticize the quotation from him. Construct this argument using a well defined limiting process over strictly finite spaces (if you can), point out the paradox, and then you will have made your point.
They don't all get the same hat, if thats what you mean. They receive arbitrary hats---some pairs might be the same color, or not. In the probability distribution I claimed was hard, all of the hat colors are different with probability 1.
This post seems like it should be linked to the myriad previous discussions of Sleeping Beauty. And perhaps point out more clearly just what clever thing you think you've done here.
If I'm forked, I expect to continue my experience as either the original or the copy with a probability of 1/2 -- whatever that means.
Apologies if this is tangential to your point, but this is incorrect, which might be causing confusion. Say that the term 'shokwave' represents the algorithm that represents my personal identity, plus all my subjective experiences. Say also that '+o' represents the subjective experience of emerging from the forking event as the original, and '+c' represents the subjective experience of emerging from the forking event as the copy.
We can see that it would be an error for 'shokwave' to expect to become 'shokwave+o' with pr 0.5 and 'shokwave+c' with pr 0.5. Both the original and the copy are 'shokwave' in every sense - by definition. Therefore, conditional on the forking event working as expected with pr 1, 'shokwave' should expect to become 'shokwave+o' with pr 1 and also should expect to become 'shokwave+c' with pr 1.
That might seem a little strange, that 'shokwave' expects two mutually exclusive events both with pr 1. After all, 'shokwave+o' is obviously not 'shokwave+c', so they must be exclusive? That's where our intuitions go wrong. The definition of our forking event is that both 'shokwave+o' and 'shokwave+c' are 'shokwave', so the two events aren't exclusive.
I think the 0.5 probability is correct, if we're talking about frequency of particular experiences. (Of course, that's not always what we're talking about, as in the Sleeping Beauty non-paradox, but in a situation like this, where there's no decision contingent on your probability estimate and you immediately know the outcome, presumably you mainly just want to know how much any given experience should surprise you.) If we assume that the duplication process is instantaneous and non-destructive, then, at the moment the duplication takes place, you know that there will soon be twice as many agents with a mind-state identical to yours, and that 50% of them will subjectively experience what feels like suddenly teleporting (to wherever the duplicator is constructing the copy).
Doesn't there have to be some point at which we consider what happens after these strange types of events? Information is only information if it causes some real change. I'm confused about this, not making a point yet.
What if we make a change in the Sleeping Beauty protocol such that, instead of more time passing in the event of tails, Beauty is merely put to sleep and wakes up twice on the same day for tails, and once for heads? So, nothing that has been discussed concerning whether 1/3 or 1/2 is changed by this new protocol (that I could find in the hundreds of comments). Now, after the experiment, the day is the same no matter which way the coin flipped. If they don't tell Beauty which way the coin flipped, what should be her expectation, from then on throughout her life, that the coin landed on heads? Clearly the answer to that question is 1/2. So why should Beauty answer differently after she's walked out of the testing facility than she did during her last interview?
I think there's a way in which our intuitions about objective time and subjective time are confusing these questions. In the original protocol, after Beauty leaves the facility and before she checks a calendar, it seems like maybe she should still be thinking she was in the two-day protocol. But as soon as she looks at a calendar, her knowledge state collapses to the correct conclusion: "If it's Monday, it was heads; if it's Tuesday, it was tails. Now I'm going to find some pancakes!"
So, I guess the conclusion I'm coming to is that during the experiment, unless there is a pay-off structure, Beauty is in an entangled state with the coin flip that is not susceptible to analysis, and so any expectations she has are meaningless in relation to reality, except as might interest psychologists. This is the case whether we go with the original protocol or the 1 day vs 2 half-days protocol I suggested.
I haven't had a chance to thoroughly analyze the specific situations offered by LucidFox, but kudos and up-vote for making me think of something new!
My instinct is actually that my subjective anticipation for the experiences of every future copy psychologically continuous with myself is 1... for all of them. If I know I am going to be copied 100 times, with one new copy activated every day for the next 100 days how could the 100th copy of me possibly be surprised to learn it was the 100th copy? It has no new information- it knew going into the experiment that there would be some psychologically continuous copy released 100 days from then- that it is that copy isn't new information at all. I have no way to mathematize this intuition but I strongly suspect using a [0,1] density for anticipating multiple copies is just wrong (which doesn't mean it won't work for a large fraction of cases).
The way I'm thinking about this is "I shuffle a deck of playing cards until it is ordered randomly, then turn over one card."
I get the 4 of ♥. Am I surprised? In one sense, yes- I couldn't have told you that I would have drawn that card with anything more than 2% certainty. In another sense, no- I expected that card was as likely as any other card. If I drew the 14 of ♥, then I would be surprised, because my model ascribed 0 probability to that event.
And so it seems like the normal measure of surprise is "can I reject the null hypothesis that this occurred by chance?". If Sleeping Beauty only has experiences consistent with her model of what will happen, why be surprised, even if those experiences individually have very tiny probabilities of occurring?
That isn't quite what I'm talking about. Obviously I shouldn't be surprised when a discrete event with a probability as high as any other discrete event occurs. This isn't an anthropic issue at all. But I think we can agree that if I buy a lottery ticket and win I should be surprised. But if I copy myself a million times and all copies wake up with lottery tickets (no other tickets having been issued) I don't think I can even be surprised to be the copy with the winning ticket- since I could tell you at the outset there would be a copy of me holding that ticket.
But if I copy myself a million times and all copies wake up with lottery tickets (no other tickets having been issued) I don't think I can even be surprised to be the copy with the winning ticket- since I could tell you at the outset there would be a copy of me holding that ticket.
I'm not seeing the relevant difference between this and a million unrelated people holding lottery tickets. Am I missing something?
So obviously your person doesn't magically transfer from the current copy of you to future copies of you. Rather, those future persons are you because they are psychologically continuous with the present you. Now when you make multiple copies of yourself it isn't right to say that just one of them will be you. You may never experience both of them but from the perspective of each copy you are their past. So when all million copies of you wake up all of them will feel like they are the next stage of you. All of them will be right. Given that you know there will be a future stage of you that will win the lottery how can that copy (the copy that is the future stage of you that has won the lottery) be surprised? The copy has, in it's past, a memory of being told that there would be exactly one copy psychologically continuous with his past self. Of course, the winning copy will have some kind of self-awareness "Oh, I'm that copy" but of course it has a memory of expecting exactly that from the copy that won the lottery.
I may need to be providing a more extensive philosophical context about personal identity for this to make sense, I'm not sure.
I don't think personal identity is a mathematical equivalence relation. Specifically, it's not symmetric: "I'm the same person you met yesterday" actually needs to read "I was the same person you met yesterday"; "I will be the same person tomorrow" is a prediction that may fail (even assuming I survive that long). This yields failures of transitivity: "Y is the same person as X" and "Z is the same person as X" doesn't get you "Y is the same person as Z".
Given that you know there will be a future stage of you that will win the lottery how can that copy (the copy that is the future stage of you that has won the lottery) be surprised?
It's not the ancestor--he who is certain to have a descendant that wins the lottery--who wins the lottery, it's that one descendant of him who wins it, and not his other one(s). Once a descendant realizes he is just one of the many copies, he then becomes uncertain whether he is the one who will win the lottery, so will be surprised when he learns whether he is. I think the interesting questions here are
1) Consider the epistemic state of the ancestor. He believes he is certain to win the lottery. There is an argument that he's justified in believing this.
2) Now consider the epistemic state of a descendant, immediately after discovering that he is one of several duplicates, but before he learns anything about which one. There is some sense in which his (the descendant's) uncertainty about whether he (the descendant) will win the lottery has changed from what it was in 1). Aside: in a Bayesian framework, this means having received some information, some evidence on which to update. But the only plausible candidate in sight is the knowledge that he is now just one particular one of the duplicates, not the ancestor anymore (e.g., because he has just awoken from the procedure). But of course, he knew that was going to happen with certainty before, so some deny that he learns anything at all. This seems directly analogous to Sleeping Beauty's predicament.
3) Descendant now learns whether he's the one who's won the lottery. Descendant could not have claimed that with certainty before, so he definitely does receive new information, and updates accordingly (all of them do). There is some sense in which the information received at this point exactly cancels out the information(?) in 2).
A couple points:
Of course, Bayesians can't revise certain knowledge, so the standard analysis gets stuck on square 1. But I don't see that the story changes in any significant way if we substitute "reasonable certainty(epsilon)" throughout, so I'm happy to stipulate if necessary.
Bayesians have a problem with de se information: "I am here now". The standard framework on which Bayes' Theorem holds deals with de re information. De se and de dicto statements have to be converted into de re statements before they can be processed as evidence. This has to be done via various calibrations that adequately disambiguate possibilities and interpret contexts and occasions: who am I, what time is it, and where am I. This process is often taken for granted, because it usually happens transparently and without error. Except when it doesn't.
I may need to be providing a more extensive philosophical context about personal identity for this to make sense, I'm not sure.
I hope you do.
With respect to the descendant "changing their mind" on the probabilility of winning the lottery: when the descendant says "I will win the lottery" perhaps that is a different statement to when the ancestor says "I will win the lottery". For the ancestor, "I" includes all the ancestor's descendants. For descendant X, "I" refers to only X (and their descendants, if any). Hence the sense that there is an update occurring is an illusion; the quotation is the same, the referent is not. There need be no information transferred.
There need be no information transferred.
I didn't quite follow this. From where to where?
But anyway, yes, that's correct that the referents of the two claims aren't the same. This could stand some further clarification as to why. In fact, Descendant's claim makes a direct reference to the individual who uttered it at the moment it's uttered, but Ancestor's claim is not about himself in the same way. As you say, he's attempting to refer to all of his descendants, and on that basis claim identity with whichever particular one of them happens to win the lottery, or not, as the case may be. (As I note above, this is not your usual equivalence relation.) This is an opaque context, and Ancestor's claim fails to refer to a particular individual (and not just because that individual exists only in the future). He can only make a conditional statement: given that X is whoever it is will win the lottery (or not), the probability that person will win the lottery (or not) is trivial. He lacks something that allows him to refer to Descendant outside the scope of the quantifier. Descendant does not lack this, he has what Ancestor did not have-- the wherewithal to refer to himself as a definite individual, because he is that individual at the time of the reference.
But a puzzle remains. On this account, Ancestor has no credence that Descendant will win the lottery, because he doesn't have the means to correctly formulate the proposition in which he is to assert a credence, except from inside the scope of a universal quantifier. Descendant does have the means, can formulate the proposition (a de se proposition), and can now assert a credence in it based on his understanding of his situation with respect to the facts he knows. And the puzzle is, Descendant's epistemic state is certainly different from Ancestor's, but it seems it didn't happen through Bayesian updating. Meanwhile, there is an event that Descendant witnessed that served to narrow the set of possible worlds he situates himself in (namely, that he is now numerically distinct from any of the other descendants), but, so the argument goes, this doesn't count as any kind of evidence of anything. It seems to me the basis for requiring diachronic consistency is in trouble.
On further reflection, both Ancestor and each Descendant can consider the proposition P(X) = "X is a descendant & X is a lottery winner". Given the setup, Ancestor can quantify over X, and assign probability 1/N to each instance. That's how the statement {"I" will win the lottery with probability 1} is to be read, in conjunction with a particular analysis of personal identity that warrants it. This would be the same proposition each descendant considers, and also assigns probability 1/N to. On this way of looking at it, both Ancestor and each descendant are in the same epistemic state, with respect to the question of who will win the lottery.
Ok, so far so good. This same way of looking at things, and the prediction about probability of descendants, is a way of looking at the Sleeping Beauty problem I tried to explain some months ago, and from what I can see is an argument for why Beauty is able to assert on Sunday evening what the credence of her future selves should be upon awakening (which is different from her own credence on Sunday evening), and therefore has no reason to change it when she later awakens on various occasions. It didn't seem to get much traction then, probably because it was also mixed in with arguments about expected frequencies.
There need be no information transferred.
I didn't quite follow this. From where to where?
I meant from anywhere to the descendant. Perhaps that wasn't the best wording.
I don't think that's relevant. If a copy would be not surprised to learn that it is the winning copy, does that mean it would be surprised to learn that it is not the winning copy? Or is it sensible that the lower probability event be the higher surprise event?
Of course, the winning copy will have some kind of self-awareness "Oh, I'm that copy" but of course it has a memory of expecting exactly that from the copy that won the lottery.
I think this is where your view breaks down.Each individual should be unsurprised that an individual will win. But each individual should be as surprised that they are the lucky winning copy as a normal person would be surprised that they are the lucky person winning a normal lottery. All you've done is reduced the interpersonal distance between the different lottery players, not changed the underlying probabilities- and so while the level of surprise may decrease on other issues (like predicting what the winnings will be used for) it shouldn't decrease on the location of the winner.
That may make clearer my view, if you word it as "this one shares a cell with the winning ticket" rather than "I won the lottery," then personhood and identity isn't an issue besides the physical aspect.
I'm thinking about how those models work together.
Like, if I win the lottery, I should think "Could I be confused? Hallucinating? Deceived? Just lucky?" Seems like, if I win the lottery vs. my clones, I should think the same things. The case that that's what surprise is about is strong.
In the case of the regular lottery before the draw I do expect someone to win, but that future person will almost certainly not contain any of my memories. When I do win I can point back to my past self that bought the ticket and think "What were the chances he would win?" But in the case of the copies when I point back to myself and ask "What were the chance he would win?", the answer is clearly "about 1".
Along the lines of Will's 'Hallucinating? Deceived?' stream of thought we can consider the case where your knowledge that you won the lottery comes from a test which has a 1% chance of false positives every time it is applied. Obviously you would strongly doubt that you were in fact the winner of the lottery even when the test tells you that you are. Hallucinating is far less likely than 1%, deception depends on the circumstance and details and it may be at least be worth a quick reality check to verify that you are not dreaming. This is the sense of 'surprise' that Will is considering equivalent to a standard lottery win.
This is not me disagreeing with your position - I upvoted each of your last few comments here. I totally agree with you that you should not be surprised that you win. Before the cloning you were totally expecting to win the lottery. It is only after the cloning and before being told whether 'you' won the lottery that you were not expecting the boon.
I don't consider the whole subjective experience thing to be particularly confusing. It just requires careful expression of what 'you' is being discussed and precise description of the conditional probabilities being considered.
If I copied you 100 times, then asked each copy, "Which copy are you?" could you answer? If not, then how can you not gain information when told that you are the 100th copy?
If I copied you 100 times, then asked each copy, "Which copy are you?" could you answer? If not, then how can you not gain information when told that you are the 100th copy?
This is still relying on a certain degree of ambiguity. If we are talking about "asking each copy" then to be clear we had best write "how can that copy not gain information when". Each individual copy gains information about identity but there is a 'you' that does not.
That was ambiguously said, yes. How abut this?
The information you-0 start out with that "you will become the 100th copy" is distinct from the information you-100 (or for that matter, you-1 through you-99) gains about identity. It is a lot like the information "someone will win the lottery."
In a sense you-0 should assign probability 1 to being told "You are the 100th copy." In another sense you-0 should assign probability 1/100. This is not a philosophical matter, but a matter of language. We could reproduce the same "paradox" by holding a 10-person lottery between 10 LessWrong users, and asking "What is the probability a LessWrong user wins the lottery?" Here the ambiguity is between "any user" (which happens with probability 1) and "any given particular user" (which happens with probability 1/10).
I think there is room to ask about two probabilities here. If there is something in the future that can only be done by you-42, it will certainly get done, so in this case the probability that you will be the 42nd copy is 1. If I ask each you-1 through you-100 to value a $100 bet that it is the 42nd copy, Dutch Book style, then each should pay $1 for the bet, so in this case we're looking at the 1/100 probability.
You don't have to call these events anything like "You have a 42nd copy" and "You are the 42nd copy". I believe this is a natural description. But in any case, what matters is that there are plainly two distinct probabilities here, and it matters which you use.
Even assuming this argument is correct, Sleeping Beauty doesn't know everything exactly when she wakes up, and it won't be the same. For example, she has a 50% chance of waking up facing right if she wakes up once, but a 75% chance if she wakes up twice. If you include everything she experiences, she's pretty much exactly twice as likely to get that experience if she wakes up twice.
However, in the Sleeping Beauty experiment, Beauty gains no new information from waking up at any time, in any outcome.
She gains the information that she is sleeping beauty waking up. This is actually very unlikely. Only a tiny fraction of life moments are sleeping beauty waking up.
There are twice as many such moments if she gets woken up twice, and presumably about the same number total. As such, this information doubles the odds that she woke up twice.
Of course, if sleeping beauty were the only person in the world ever, and only lived for this experiment, then the probability would be 50:50. If there were only a few other life moments, it would be somewhere in between.
In short, there is a difference between knowing that there exists a sleeping beauty waking up, and knowing you are a sleeping beauty waking up.
A good example of this is that a given planet has about a one in 5000 chance of having as little cosmic radiation as Earth. The fact that a planet like that has intelligent life just tells you that cosmic radiation isn't necessary. The fact that it's our planet tells us that it's actively harmful.
However, in the Sleeping Beauty experiment, Beauty gains no new information from waking up at any time, in any outcome.
She gains the information that she is sleeping beauty waking up. This is actually very unlikely. Only a tiny fraction of life moments are sleeping beauty waking up.
But that does not constitute any new information relative to her state of knowledge before the experiment started. She already knew in advance what she'd be experiencing.
She didn't know she was sleeping beauty waking up when she went to sleep. She did when she woke up. Thus, she had different information.
She knew she would become sleeping beauty waking up, but that's entirely different information. The prior is orders of magnitude higher. She'd spend years being sleeping beauty going to wake up in that experiment, but only one or two days doing so.
In what way is "I will be Sleeping-Beauty-waking-up" different information than "I am Sleeping-Beauty-waking-up"? I cannot see any difference other than verb tense, and since one is spoken before the other, that only constitutes a difference in the frame of reference in which each one is spoken, not a difference in propositional content; they're saying the same thing about the same event. (I think this is something like a variable question fallacy.)
The verb tense is important. One only works from a single frame of reference. The other works from several.
The probability of a given frame of reference is 1/number of frames of reference.
In my planet example, from any frame of reference you can say that there is a planet with intelligent life that has less cosmic radiation that all but 1/5000 of them, but only on planets like that can you say that that's true of that planet.
The first is true unless cosmic radiation is necessary. The second is more likely to be true if cosmic radiation is harmful than if it isn't.
Do we have enough information to deduce that cosmic radiation is harmful, or, for that matter, that planets are helpful? They use the same basic argument.
These make different predictions, so it isn't the variable question fallacy.
Before I move on to a summation post listing the various raised thought experiments and paradoxes related to mind copying, I would like to cast attention to a particular moment regarding the notion of "subjective probability".
In my earlier discussion post on the subjective experience of a forked person, I compared the scenario where one copy is awakened in the future to the Sleeping Beauty thought experiment. And really, it describes any such process, because there will inevitably be a time gap, however short, between the time of fork and the copy's subjective awakening: no copy mechanism can be instant.
In the traditional Sleeping Beauty scenario, there are two parties: Beauty and the Experimenter. The Experimenter has access to a sleep-inducing drug that also resets Beauty's memory to the state at t=0. Suppose Beauty is put to sleep at t=0, and then a fair coin is tossed. If the coin comes heads, Beauty is woken up at t=1, permanently. If the coin comes tails, Beauty is woken up at t=1, questioned, memory-wiped, and then woken up again at t=2, this time permanently.
In this experiment, intuitively, Beauty's subjective anticipation of the coin coming tails, without access to any information other than the conditions of the experiment, should be 2/3. I won't be arguing here whether this particular answer is right or wrong: the discussion has been raised many times before, and on Less Wrong as well. I'd like to point out one property of the experiment that differentiates it from other probability-related tasks: erasure of information, which renders the whole experiment a non-experiment.
In Bayesian theory, the (prior) probability of an outcome is the measure of our anticipation of it to the best of our knowledge. Bayesians think of experiments as a way to get new information, and update their probabilities based on the information gained. However, in the Sleeping Beauty experiment, Beauty gains no new information from waking up at any time, in any outcome. She has the exact same mind-state at any point of awakening that she had at t=0, and is for all intents and purposes the exact same person at any such point. As such, we can ask Beauty, "If we perform the experiment, what is your anticipation of waking up in the branch where the coin landed tails?", and she can give the same answer without actually performing the experiment.
So how does it map to the mind-copying problem? In a very straightforward way.
Let's modify the experiment this way: at t=0, Beauty's state is backed up. Let's suppose that she is then allowed to live her normal life, but the time-slices are large enough that she dies within the course of a single round. (Say, she has a normal human lifespan and the time between successive iterations is 200 years.) However, at t=1, a copy of Beauty is created in the state at which the original was at t=0, a coin is tossed, and if and only if it comes tails, another copy is created at t=2.
If Beauty knows the condition of this experiment, no matter what answer she would give in the classic formulation of the problem, I don't expect it to change here. The two formulations are, as far as I can see, equivalent.
However, in both cases, from the Experimenter's point of view, the branching points are independent events, which allows us to construct scenarios that question the straightforward interpretation of "subjective probability". And for this, I refer to the last experiment in my earlier post.
Imagine you have an indestructible machine that restores one copy of you from backup every 200 years. In this scenario, it seems you should anticipate waking up with equal probability between now and the end of time. But it's inconsistent with the formulation of probability for discrete outcomes: we end up with a diverging series, and as the length of the experiment approaches infinity (ignoring real-world cosmology for the moment), the subjective probability of every individual outcome (finding yourself at t=1, finding yourself at t=2, etc.) approaches 0. The equivalent classic formulation is a setup where the Experimenter is programmed to wake Beauty after every time-slice and unconditionally put her back to sleep.
This is not the only possible "diverging Sleeping Beauty" problem. Suppose that at t=1, Beauty is put back to sleep with probability 1/2 (like in the classic experiment), at t=2 she is put back to sleep with probability 1/3, then 1/4, and so on. In this case, while it seems almost certain that she will eventually wake up permanently (in the same sense that it is "almost certain" that a fair random number generator will eventually output any given value), the expected value is still infinite.
In the case of a converging series of probabilities of remaining asleep - for example, if it's decided by a coin toss at each iteration whether Beauty is put back to sleep, in which case the series is 1/2 + 1/4 + 1/8 + ... = 1 -- Beauty can give a subjective expected value, or the average time at which she expects to be woken up permanently.
In a general case, let Ei be the event "the experiment continues at stage i" (that is, Beauty is not permanently awakened at stage i, or in the alternate formulation, more copies are created beyond that point). Then if we extrapolate the notion of "subjective probability" that leads us to the answer 2/3 in the classic formulation, then the definition is meaningful if and only if the series of objective probabilities ∑i=1..∞ P(Ei) converges -- it doesn't have to converge to 1, we'll just need to renormalize the calculations otherwise. Which, given that the randomizing events are independent, simply doesn't have to happen.
Even if we reformulate the experiment in terms of decision theory, it's not clear how it will help us. If the bet is "win 1 utilon if you get your iteration number right", the probability of winning it in a divergent case is 0 at any given iteration. And yet, if all cases are perfectly symmetric information-wise so that you make the same decision over and over again, you'll eventually get the answer right, with exactly one of you winning the bet, even no matter what your "decision function" is - even if it's simply something like "return 42;". Even a stopped clock is right sometimes, in this case once.
It would be tempting, seeing this, to discard the notion of "subjective anticipation" altogether as ill-defined. But that seems to me like tossing out the Born probabilities just because we go from Copenhagen to MWI. If I'm forked, I expect to continue my experience as either the original or the copy with a probability of 1/2 -- whatever that means. If I'm asked to participate in the classic Sleeping Beauty experiment, and to observe the once-flipped coin at every point I wake up, I will expect to see tails with a probability of 2/3 -- again, whatever that means.
The situations described here have a very specific set of conditions. We're dealing with complete information erasure, which prevents any kind of Bayesian update and in fact makes the situation completely symmetric from the decision agent's perspective. We're also dealing with an anticipation all the way into infinity, which cannot occur in practice due to the finite lifespan of the universe. And yet, I'm not sure what to do with the apparent need to update my anticipations for times arbitrarily far into the future, for an arbitrarily large number of copies, for outcomes with an arbitrarily high degree of causal removal from my current state, which may fail to occur, before the sequence of events that can lead to them is even put into motion.