Morendil comments on Updating, part 1: When can you change your mind? The binary model - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (153)
Unlike Jack, I'm pessimistic about your proposal. I've already changed my mind not once but twice.
The interesting aspect is that this doesn't feel like I'm vacillating. I have gone from relying on a vague and unreliable intuition in favor of 1/3 qualified with "it depends", to being moderately certain that 1/2 was unambiguously correct, to having worked out how I was allocating all of the probability mass in the original problem and getting back 1/3 as the answer that I cannot help but think is correct. That, plus the meta-observation that no-one, including people I've asked directly (including yourself), has a rebuttal to my construction of the table, is leaving me with a higher degree of confidence than I previously had in 1/3.
It now feels as if I'm justified to ignore pretty much any argument which is "merely" a verbal appeal to one intuition or the other. Either my formalization corresponds to the problem as verbally stated or it doesn't; either my math is correct or it isn't. "Here I stand, I can no other" - at least until someone shows me my mistake.
So I think I figured this whole thing out. Are people familiar with the type-token distinction and resulting ambiguities? If I have five copies of the book Catcher in the Rye and you ask me how many books I have there is an ambiguity. I could say one or five. One refers to the type, "Catcher in the Rye is a coming of age novel" is a sentence about the type. Five refers to the number of tokens, "I tossed Catcher in the Rye onto the bookshelf" is a sentence about the token. The distinction is ubiquitous and leads to occasional confusion, enough that the subject is at the top of my Less Wrong to-do list. The type token distinction becomes an issue whenever we introduce identical copies and the distinction dominates my views on personal identity.
In the Sleeping Beauty case, the amnesia means the experience of waking up on Monday and the experience of waking up on Tuesday, while token-distinct are type-identical. If we decide the right thing to update on isn't the token experience but the type experience: well the calculations are really easy. The type experience "waking up" has P=1 for heads and tails. So the prior never changes. I think there are some really good reasons for worrying about types rather than tokens in this context but won't go into until I make sure the above makes sense to someone.
How are you accounting for the fact that - on awakening - beauty has lost information that she previously had - namely that she no longer knows which day of the week it is?
Maybe it's just because I haven't thought about this in a couple of weeks but you're going to have to clarify this. When does beauty know which day of the week it is?
Before consuming the memory-loss drugs she knows her own temporal history. After consuming the drugs, she doesn't. She is more uncertain - because her memory has been meddled with, and important information has been deleted from it.
Information wasn't deleted. Conditions changed and she didn't receive enough information about the change. There is a type (with a single token) that is Beauty before the experiment and that type includes a property 'knows what day of the week it is', then the experiment begins and the day changes. During the experiment there is another type which is also Beauty, this type has two tokens. This type only has enough information to narrow down the date to one of two days. But she still knows what day of the week it was when the experiment began, it's just your usual indexical shift (instead of knowing the date now she knows the date then but it is the same thing).
Her memories were DELETED. That's the whole point of the amnesia-inducing drug.
Amnesia = memory LOSS: http://dictionary.reference.com/browse/Amnesia
Oh sure, the information contained in the memory of waking up is lost (though that information didn't contain what day of the week it was and you said "namely that she no longer knows which day of the week it is"). I still have zero idea of what you're trying to ask me.
If she had not ever been given the drug she would be likely to know which day of the week it was. She would know how many times she had been woken up, interviewed, etc. It is because all such information has been chemically deleted from her mind that she has the increased uncertainty that she does.
I might have some issues with that characterization but they aren't worth going into since I still don't know what this has to do with my discussion of the type-token ambiguity.
Makes sense to me.
Entertainingly, I feel justified in ignoring your argument and most of the others for the same reason you feel justified in ignoring other arguments.
I got into a discussion about the SB problem a month ago after Mallah mentioned it as related to the red door/blue doors problem. After a while I realized I could get either of 1/2 or 1/3 as an answer, despite my original intuition saying 1/2.
I confirmed both 1/2 and 1/3 were defensible by writing a computer program to count relative frequencies two different ways. Once I did that, I decided not to take seriously any claims that the answer had to be one or the other, since how could a simple argument overrule the result of both my simple arithmetic and a computer simulation?
I was thinking about that earlier.
A higher level of understanding of an initially mysterious question should translate into knowing why people may disagree, and still insist on answers that you yourself have discarded. You explain away their disagreement as an inferential distance.
Neither of the answers you have arrived at is correct, from my perspective, and I can explain why. So I feel justified in ignoring your argument for ignoring my argument. :)
That a simulation program should compute 1/2 for "how many times on average the coin comes up heads per time it is flipped" is simply P(x) in my formalization. It's a correct but entirely uninteresting answer to something other than the problem's question.
That your program should compute 1/3 for "how many times on average the coin comes up heads per time Beauty is awoken" is also a correct answer to a slightly more subtly mistaken question. If you look at the "Halfer variant" page of my spreadsheet, you will see a probability distribution that also correspond to the same "facts" that yield the 1/3 answer, and yet applying the laws of probability to that distribution give Beauty a credence of 1/2. The question your program computes an answer to is not the question "what is the marginal probability of x=Heads, conditioning on z=Woken".
Whereas, from the tables representing the joint probability distribution, I think I now ought to be able to write a program which can recover either answer: the Thirder answer by inputting the "right" model or the Halfer answer by inputting the "wrong" model. In the Halfer model, we basically have to fail to sample on Heads/Tuesday. Commenting out one code line might be enough.
ETA: maybe not as simple as that, now that I have a first cut of the program written; we'd need to count awakenings on monday twice, which makes no sense at all. It does look as if our programs are in fact computing the same thing to get 1/3.
Which specific formulation of the Sleeping Beauty problem did you use to work things out? Maybe we're referring to descriptions of the problem that use different wording; I've yet to read a description that's convinced me that 1/2 is an answer to the wrong question. For example, here's the wiki's description asks
Personally, I believe that using the word 'subjective' doesn't add anything here (it just sounds like a cue to think Bayesian-ishly to me, which doesn't change the actual answer). So I read the question as asking for the probability of the coin landing tails given the experiment's setup. As it's asking for a probabiliy, I see it as wholly legitimate to answer it along the lines of 'how many times on average the coin comes up heads per X,' where X is one of the two choices you mentioned.
If you ignore the specification that it is Beauty's subjective probability under discussion, the problem becomes ill-defined - and multiple answers become defensible - depending on whose perspective we take.
The word 'subjective' before the word 'probability' is empty verbiage to me, so (as I see it) it doesn't matter whether you or I have subjectivity in mind. The problem's ill-defined either way; 'the specification that it is Beauty's subjective probability' makes no difference to me.
The perspective makes a difference:
"In other words, only in a third of the cases would heads precede her awakening. So the right answer for her to give is 1/3. This is the correct answer from Beauty's perspective. Yet to the experimenter the correct probability is 1/2."
I think it's not the change in perspective or subjective identity making a difference, but instead it's a change in precisely which probability is being asked about. The Wikipedia page unhelpfully conflates the two changes.
It says that the experimenter must see a probability of 1/2 and Beauty must see a probability of 1/3, but that just ain't so; there is nothing stopping Beauty from caring about the proportion of coin flips that turn out to be heads (which is 1/2), and there is nothing stopping the experimenter from caring about the proportion of wakings for which the coin is heads (which is 1/3). You can change which probability you care about without changing your subjective identity and vice versa.
Let's say I'm Sleeping Beauty. I would interpret the question as being about my estimate of a probability ('credence') associated with a coin-flipping process. Having interpreted the question as being about that process, I would answer 1/2 - who I am would have nothing to do with the question's correct answer, since who I am has no effect on the simple process of flipping a fair coin and I am given no new information after the coin flip about the coin's state.
In the original problem post, Beauty is asked a specific question, though - namely:
"What is your credence now for the proposition that our coin landed heads?"
That's fairly clearly the PROBABILITY NOW of the coin having landed heads - and not the PROPORTION that turn out AT SOME POINT IN THE FUTURE to have landed heads.
Perspective can make a difference - because different observers have different levels of knowledge about the situation. In this case, Beauty doesn't know whether it is Tuesday or not - but she does know that if she is being asked on Tuesday, then the coin came down tails - and p(heads) is about 0.
It's not specific enough. It only asks for Beauty's credence of a coin landing heads - it doesn't tell her to choose between the credence of a coin landing heads given that it is flipped and the credence of a coin landing heads given a single waking. The fact that it's Beauty being asked does not, in and of itself, mean the question must be asking the latter probability. It is wholly reasonable for Beauty to interpret the question as being about a coin-flipping process for which the associated probability is 1/2.
The addition of the word 'now' doesn't magically ban you from considering a probability as a limiting relative frequency.
Agree.
It's not clear to me how this conditional can be informative from Beauty's perspective, as she doesn't know whether it's Tuesday or not. The only new knowledge she gets is that she's woken up; but she has an equal probability (i.e. 1) of getting evidence of waking up if the coin's heads or if the coin's tails. So Beauty has no more knowledge than she did on Sunday.
I read it as "What is your credence", which is supposed to be synonymous with "subjective probability", which - and this is significant - I take to entail that Beauty must condition on having been woken (because she conditions on every piece of information known to her).
In other words, I take the question to be precisely "What is the probability you assign to the coin having come up heads, taking into account your uncertainty as to what day it is."
Ahhhh, I think I understand a bit better now. Am I right in thinking that your objection is not that you disapprove of relative frequency arguments in themselves, but that you believe the wrong relative frequency/frequencies is/are being used?
Right up until your reply prompted me to write a program to check your argument, I wasn't thinking in terms of relative frequencies at all, but in terms of probability distributions.
I haven't learned the rules for relative frequencies yet (by which I mean thing like "(don't) include counts of variables that have a correlation of 1 in your denominator"), so I really have no idea.
Here is my program - which by the way agrees with neq1's comment here, insofar as the "magic trick" which will recover 1/2 as the answer consists of commenting out the TTW line.
However, this seems perfectly nonsensical when transposed to my spreadsheet: zeroing out the TTW cell at all means I end up with a total probability mass less than 1. So, I can't accept at the moment that neq1's suggestion accords with the laws of probability - I'd need to learn what changes to make to my table and why I should make them.
Replying again since I've now looked at the spreadsheet.
Using my intuition (which says the answer is 1/2), I would expect P(Heads, Tuesday, Not woken) + P(Tails, Tuesday, Not woken) > 0, since I know it's possible for Beauty to not be woken on Tuesday. But the 'halfer "variant"' sheet says P(H, T, N) + P(T, T, N) = 0 + 0 = 0, so that sheet's way of getting 1/2 must differ from how my intuition works.
(ETA - Unless I'm misunderstanding the spreadsheet, which is always possible.)
Yeah, that "Halfer variant" was my best attempt at making sense of the 1/2 answer, but it's not very convincing even to me anymore.
That program is simple enough that you can easily compute expectations of your 8 counts analytically.
Your program looks good here, your code looks a lot like mine, and I ran it and got ~1/2 for P(H) and ~1/3 for F(H|W). I'll try and compare to your spreadsheet.
Well, perhaps because relative frequencies aren't always probabilities?
Of course. But if I simulate the experiment more and more times, the relative frequencies converge on the probabilities.
Even in the limit not all relative frequencies are probabilities. In fact, I'm quite sure that in the limit ntails/wakings is not a probability. That's because you don't have independent samples of wakings.
But if there is a probability to be found (and I think there is) the corresponding relative frequency converges on it almost surely in the limit.
I don't understand.
I tried to explain it here: http://lesswrong.com/lw/28u/conditioning_on_observers/1zy8
Basically, the 2 wakings on tails should be thought of as one waking. You're just counting the same thing twice. When you include counts of variables that have a correlation of 1 in your denominator, it's not clear what you are getting back. The thirders are using a relative frequency that doesn't converge to a probability
This is true if we want the ratio of tails to wakings. However...
Despite the perfect correlation between some of the variables, one can still get a probability back out - but it won't be the probability one expects.
Maybe one day I decide I want to know the probability that a randomly selected household on my street has a TV. I print up a bunch of surveys and put them in people's mailboxes. However, it turns out that because I am very absent-minded (and unlucky), I accidentally put two surveys in the mailboxes of people with a TV, and only one in the mailboxes of people without TVs. My neighbors, because they enjoy filling out surveys so much, dutifully fill out every survey and send them all back to me. Now the proportion of surveys that say 'yes, I have a TV' is not the probability I expected (the probability of a household having a TV) - but it is nonetheless a probability, just a different one (the probability of any given survey saying, 'I have a TV').
That's a good example. There is a big difference though (it's subtle). With sleeping beauty, the question is about her probability at a waking. At a waking, there are no duplicate surveys. The duplicates occur at the end.
That is a difference, but it seems independent from the point I intended the example to make. Namely, that a relative frequency can still represent a probability even if its denominator includes duplicates - it will just be a different probability (hence why one can get 1/3 instead of 1/2 for SB).
Morendil,
This is strange. It sounds like you have been making progress towards settling on an answer, after discussion with others. That would suggest to me that discussion can move us towards consensus.
I like your approach a lot. It's the first time I've seen the thirder argument defended with actually probability statements. Personally, I think there shouldn't be any probability mass on 'not woken', but that is something worth thinking about and discussing.
One thing that I think is odd. Thirders know she has nothing to update on when she is woken, because they admit she will give the same answer, regardless of if it's heads or tails. If she really had new information that is correlated with the outcome, her credence would move towards heads when heads, and tails when tails.
Consider my cancer intuition pump example. Everyone starts out thinking there is a 50% chance they have cancer. Once woken, regarldess of if they have cancer or not, they all shift to 90%. Did they really learn anything about their disease state by being woken? If they did, those with cancer would have shifted their credence up a bit, and those without would have shifted down. That's what updating is.
In your example the experimenter has learned whether you have cancer. And she reflects that knowledge in the structure of the experiment: you are woken up 9 times if you have the disease.
Set aside the amnesia effects of the drug for a moment, and consider the experimental setup as a contorted way of imparting the information to the patient. Then you'd agree that with full memory, the patient would have something to update on? As soon as the second day. So there is, normally, an information flow in this setup.
What the amnesia does is selectively impair the patient's ability to condition on available information. it does that in a way which is clearly pathological, and results in the counter-intuitive reply to the question "conditioning on a) your having woken up and b) your inability to tell what day it is, what is your credence"? We have no everyday intuitions about the inferential consequences of amnesia.
Knowing about the amnesia, we can argue that Beauty "shouldn't" condition on being woken up. But if she does, she'll get that strange result. If she does have cancer, she is more likely to be woken up multiple times than once, and being woken up at all does have some evidential weight.
All this, though, being merely verbal aids as I try to wrap my head around the consequences of the math. And therefore to be taken more circumspectly than the math itself.
If she does condition on being woken up, I think she still gets 1/2. I hate to keep repeating arguments, but what she knows when she is woken up is that she has been woken up at least once. If you just apply Bayes rule, you get 1/2.
If conditioning causes her to change her probability, it should do so in such a way that makes her more accurate. But as we see in the cancer problem, people with cancer give the same answer as people without.
Yes, but then we wouldn't be talking about her credence on an awakening. We'd be talking about her credence on first waking and second waking. We'd treat them separately. With amnesia, 2 wakings are the same as 1. It's really just one experience.
Apply it to what terms?
I'm not sure what more I can say without starting to repeat myself, too. All I can say at this point, having formalized my reasoning as both a Python program and an analytical table giving out the full joint distribution, is "Where did I make a mistake?"
Where's the bug in the Python code? How do I change my joint distribution?
I like the version of your halfer variant version of your table. I still need to think about your distributions more though. I'm not sure it makes sense to have a variable 'woken that day' for this problem.
Congratulations on getting to that point, I figure.