Comment author: CarlShulman 17 May 2010 06:37:27PM 0 points [-]

I suggest reading Radford Neal.

Comment author: neq1 17 May 2010 06:49:15PM 0 points [-]

Yes, I've read that paper, and disagree with much of it. Perhaps I'll take the time to explain my reasoning sometime soon

Comment author: neq1 17 May 2010 05:48:31PM 1 point [-]

Anthropic reasoning is what leads people to believe in miracles. Rare events have a high probability of occurring if the number of observations is large enough. But whoever that rare event happens to will feel like it couldn't have just happened by chance, because the odds of it happening to them was so large.

If you wait until the event occurs, and then start treating it as a random event from a single trial, forming your hypothesis after seeing the data, you'll make inferential errors.

Imagine that there are balls in an urn, labeled with numbers 1, 2,...,n. Suppose we don't know n. A ball is selected. We look at it. We see that it's number x.

non-anthropic reasoning: all numbers between 1 and n were equally likely. I was guaranteed to observe some number, and the probability that it was close to n was the same as the probability that it was far from n. So all I know is that n is greater than or equal to x.

anthropic reasoning: A number as small as x is much less likely if n is large. Therefore, hypotheses with n close to x are more likely than hypotheses where n is much larger than x.

Comment author: Morendil 16 May 2010 12:57:32PM 0 points [-]

Consider the case of Sleeping Beauty with an absent-minded experimenter.

If the coin comes up Heads, there is a tiny but non-zero chance that the experimenter mixes up Monday and Tuesday.

If the coin comes up Tails, there is a tiny but non-zero chance that the experimenter mixes up Tails and Heads.

The resulting scenario is represented in a new sheet, Fuzzy two-day, of my spreadsheet document.

Under these assumptions, Beauty may no longer rule out Tuesday & Heads. She has no justification to assign all of the Heads probability mass to Monday & Heads. She is therefore constrained to conditioning on being woken in the way that the usual two-day variant suggests she should, and ends up with a credence arbitrarily close to 1/3 if we make the "absent-minded" probability tiny enough.

Why should we get a discontinuous jump to 1/2 as this becomes zero?

Comment author: neq1 17 May 2010 01:41:57PM -1 points [-]

This is interesting. We shouldn't get a discontinuous jump.

Consider 2 related situations:

  1. if Heads she is woken up on Monday, and the experiment ends on Tuesday. If tails, she is woken up on Monday and Tuesday, and the experiment ends on Wed. In this case, there is no 'not awake' option.

  2. If heads she is woken up on Monday and Tuesday. On Monday she is asked her credence for heads. On Tuesday she is told "it's Tuesday and heads" (but she is not asked about her credence; that is, she is not interviewed). If tails, it's the usual woken up both days and asked about her credence. The experiment ends on Wed.

In both of these scenarios, 50% of coin flips will end up heads. In both cases, if she's interviewed she knows it's either Monday&heads, Monday&tails or Tuesday&tails. She has no way of telling these three options apart, due to the amnesia.

I don't think we should be getting different answers in these 2 situations. Yet, I think if we use your probability distributions we do.

I think there are two basic problems. One is that Monday&tails is really not different from Tuesday&tails. They are the same variable. It's the same experience. If she could time travel and repeat the monday waking it would feel the same to her as the Tuesday waking. The other issue is that, even though in my scenario 2 above, when she is woken but before she knows if she will be interviewed, it would look like there is a 25% chance it's heads&Monday and a 25% it's heads&Tuesday. And that's probably a reasonable way to look at it. But, that doesn't imply that, once she finds out it's an interview day, that the probability of heads&Monday shifts to 1/3. That's because on 50% of coin flips she will experience heads&Monday. That's what makes this different than a usual joint probability table representing independent events.

Comment author: kmccarty 15 May 2010 05:15:18AM *  1 point [-]

Your argument is, I take it, that these counts of observations are irrelevant, or at best biased.

No, I was just saying that this, lim N-> infinity n1/(n1+n2+n3), is not actually a probability in the sleeping beauty case.>

I maintain that it is. I can guarantee you that it is. What obstacle do you see to accepting that? You've made noises that this is because the counts are correlated, but I haven't seen any argument for this beyond bare assertion. Do you want to claim it is impossible for some reason, or are you just saying you haven't seen a persuasive argument yet?

The disagreement seems to center on the denominator; it should count not awakenings, but coin-tosses.

No, I wouldn't say that. My argument is that you should use probability laws to get the answer. If you take ratios of expected counts, well, you have to show that what you get as actually a probability.

What would you require for proof? If I could show you a Markov chain whose behavior is isomorphic to iterated Sleeping Beauty, would that convince you?

I also am not sure what you mean when you say "use probability laws". Is there a failure to comport with the Kolmogorov axioms? Is there a problem with the definition of the events? Do you mean Bayes' Theorem, or some other law(s)? I also am deeply suspicious of the phrase "get the answer". I will have no idea what this could mean until we can eliminate ambiguity about what the question is (there seems to be a lot of that going around), or what class of questions you'll admit as legitimate.

defining feature of Sleeping Beauty--all Sleeping Beauty's awakenings are epistemically indistinguishable. She has no choice but to treat them all identically.

Hm, I think that is what I'm saying. She does have to treat them all identically. They are the same variable. That's why she has to say the same thing on Monday and Tuesday.

Up to this point, I see we are actually in strenuous agreement on this aspect, so I can stop belaboring it.

That's why an awakening contains no new info. If she had new evidence at an awakening, she'd give different answers under heads and tails.

I don't mean to claim that as soon as Beauty awakes, new evidence comes to light that she can add to her store of bits in additive fashion, and thereby update her credence from 1/2 to 1/3 along the way. If this is the only kind of evidence that your theory of Bayesian updating will acknowledge, then it is too restrictive. Since Beauty is apprised of all the relevant details of the experimental process on Sunday evening, she can (and should) use the fact that the predicted frequency of awakenings into a reset epistemic state is dependent on the state of the coin toss to change the credence she reports on such awakenings from 1/2 to 1/3. She can tell you this on Sunday night, just as I can tell you now, before any of us enter into any such experimental procedure. So her prediction about what she should answer on an awakening does not change from Sunday evening to Monday morning.

The key pieces of information she uses to arrive at this revised estimate are:

  • That the questions will be asked in a reset epistemic state. This requires her to give the same answer on all awakenings.
  • That the frequency of awakenings is dependent in a specific way on the result of the coin toss. This requires her to update the credence she'll report on awakenings from 1/2 to 1/3.
Comment author: neq1 16 May 2010 12:23:35PM 0 points [-]

At this point, it is just assertion that it's not a probability. I have reasons for believing it's not one, at least, not the probability that people think it is. I've explained some of that reasoning.

I think it's reasonable to look at a large sample ratio of counts (or ratio of expected counts). The best way to do that, in my opinion, is with independent replications of awakenings (that reflect all possibilities at an awakening). I probably haven't worded this well, but consider the following two approaches. For simplicity, let's say we wanted to do this (I'm being vague here) 1000 times.

  1. Replicate the entire experiment 1000 times. That is, there will be 1000 independent tosses of the coin. This will lead between 1000 and 2000 awakenings, with expected value of 1500 awakenings. But... whatever the total number of awakenings are, they are not independent. For example, one the first awakening it could be either heads or tails. On the second awakening, it only could be heads if it was heads on the first awakening. So, Beauty's options on awakening #2 are (possibly) different than her options on awakening #1. We do not have 2 replicates of the same situation. This approach will give you the correct ratio of counts in the long run (for example, we do expect the # of heads & Monday to equal the # of tails and Monday and the # of tails and Tuesday).

  2. Replicate her awakening-state 1000 times. Because her epistemic state is always the same on an awakening, from her perspective, it could be Monday or Tuesday, it could be heads or tails. She knows that it was a fair coin. She knows that if she's awake it's definitely Monday if heads, and could be either Monday or Tuesday if tails. She knows that 50% of coin tosses would end up heads, so we assign 0.5 to Monday&heads. She knows that 50% of coin tosses would end up tails, so we assign 0.5 to tails, which implies 0.25 to tails&Monday and 0.25 to tails&Tuesday. If we generate observations from this 1000 times, we'll get 1000 awakenings. We'll end up with heads 50% of the time.

The distinction between 1 and 2 is that, in 2, we are trying to repeatedly sample from the joint probability distributions that she should have on an awakening. In 1, we are replicating the entire experiment, with the double counting on tails.

In 1, people are using these ratios of expected counts to get the 1/3 answer. 1/3 is the correct answer to the question about the long-run frequencies of awakenings preceded by heads to awakenings preceded by tails. But I do not think it is the answer to the question about her credence of heads on an awakening.

In 2, the joint probabilities are determined ahead of time based on what we know about the experiment.

Let n2 and n3 are counts, in repeated trials, of tails&Monday and tails&Tuesday, respectively. You will of course see that n2=n3. They are the same random variable. tails&Monday and tails&Tuesday are the same. It's like what Jack said about types and tokens. It's like Vladimir_Nesov said:

Two subsequent states of a given dynamical system make for poor distinct elements of a sample space: when we've observed that the first moment of a given dynamical trajectory is not the second, what are we going to do when we encounter the second one? It's already ruled "impossible"! Thus, Monday and Tuesday under the same circumstances shouldn't be modeled as two different elements of a sample space.

You said:

I don't mean to claim that as soon as Beauty awakes, new evidence comes to light that she can add to her store of bits in additive fashion, and thereby update her credence from 1/2 to 1/3 along the way. If this is the only kind of evidence that your theory of Bayesian updating will acknowledge, then it is too restrictive.

I don't think it matters if she has the knowledge before the experiment or not. What matters is if she has new information about the likelihood of heads to update on. If she did, we would expect her accuracy to improve. So, for example, if she starts out believing that heads has probability 1/2, but learns something about the coin toss, her probability might go up a little if heads and down a little if tails. Suppose, for example, she is informed of a variable X. If P(heads|X)=P(tails|X), then why is she updating at all? Meaning, why is P(heads)=/=P(heads|X)? This would be unusual. It seems to me that the only reason she changes is because she knows she'd be essentially 'betting' twice of tails, but that really is distinct from credence for tails.

Comment author: Jonathan_Graehl 15 May 2010 12:06:04AM 0 points [-]

this is a probability tree corresponding to an arbitrary wake up day

Huh? If tails, then Beauty is (always) woken on Monday. Why do you have probability=1/2 there?

(likewise for Tuesday)

Comment author: neq1 15 May 2010 12:28:55AM 0 points [-]

The probability represents how she should see things when she wakes up.

She knows she's awake. She knows heads had probability 0.5. She knows that, if it landed heads, it's Monday with probability 1. She knows that, if it landed tails, it's either Monday or Tuesday. Since there is no way for her to distinguish between the two, she views them as equally likely. Thus, if tails, it's Monday with probability 0.5 and Tuesday with probability 0.5.

Comment author: Morendil 14 May 2010 08:01:28PM 0 points [-]

If you just apply Bayes rule, you get 1/2.

Apply it to what terms?

I'm not sure what more I can say without starting to repeat myself, too. All I can say at this point, having formalized my reasoning as both a Python program and an analytical table giving out the full joint distribution, is "Where did I make a mistake?"

Where's the bug in the Python code? How do I change my joint distribution?

Comment author: neq1 14 May 2010 08:21:49PM 0 points [-]

I like the version of your halfer variant version of your table. I still need to think about your distributions more though. I'm not sure it makes sense to have a variable 'woken that day' for this problem.

Comment author: Jack 14 May 2010 06:10:08PM *  6 points [-]

So I think I figured this whole thing out. Are people familiar with the type-token distinction and resulting ambiguities? If I have five copies of the book Catcher in the Rye and you ask me how many books I have there is an ambiguity. I could say one or five. One refers to the type, "Catcher in the Rye is a coming of age novel" is a sentence about the type. Five refers to the number of tokens, "I tossed Catcher in the Rye onto the bookshelf" is a sentence about the token. The distinction is ubiquitous and leads to occasional confusion, enough that the subject is at the top of my Less Wrong to-do list. The type token distinction becomes an issue whenever we introduce identical copies and the distinction dominates my views on personal identity.

In the Sleeping Beauty case, the amnesia means the experience of waking up on Monday and the experience of waking up on Tuesday, while token-distinct are type-identical. If we decide the right thing to update on isn't the token experience but the type experience: well the calculations are really easy. The type experience "waking up" has P=1 for heads and tails. So the prior never changes. I think there are some really good reasons for worrying about types rather than tokens in this context but won't go into until I make sure the above makes sense to someone.

Comment author: neq1 14 May 2010 06:23:07PM 0 points [-]

Makes sense to me.

Comment author: cupholder 14 May 2010 06:01:46PM 0 points [-]

That is a difference, but it seems independent from the point I intended the example to make. Namely, that a relative frequency can still represent a probability even if its denominator includes duplicates - it will just be a different probability (hence why one can get 1/3 instead of 1/2 for SB).

Comment author: neq1 14 May 2010 06:05:13PM 1 point [-]

Ok, yes, sometimes relative frequencies with duplicates can be probabilities, I agree.

Comment author: Morendil 14 May 2010 05:39:18PM 2 points [-]

In your example the experimenter has learned whether you have cancer. And she reflects that knowledge in the structure of the experiment: you are woken up 9 times if you have the disease.

Set aside the amnesia effects of the drug for a moment, and consider the experimental setup as a contorted way of imparting the information to the patient. Then you'd agree that with full memory, the patient would have something to update on? As soon as the second day. So there is, normally, an information flow in this setup.

What the amnesia does is selectively impair the patient's ability to condition on available information. it does that in a way which is clearly pathological, and results in the counter-intuitive reply to the question "conditioning on a) your having woken up and b) your inability to tell what day it is, what is your credence"? We have no everyday intuitions about the inferential consequences of amnesia.

Knowing about the amnesia, we can argue that Beauty "shouldn't" condition on being woken up. But if she does, she'll get that strange result. If she does have cancer, she is more likely to be woken up multiple times than once, and being woken up at all does have some evidential weight.

All this, though, being merely verbal aids as I try to wrap my head around the consequences of the math. And therefore to be taken more circumspectly than the math itself.

Comment author: neq1 14 May 2010 05:49:58PM 1 point [-]

If she does condition on being woken up, I think she still gets 1/2. I hate to keep repeating arguments, but what she knows when she is woken up is that she has been woken up at least once. If you just apply Bayes rule, you get 1/2.

If conditioning causes her to change her probability, it should do so in such a way that makes her more accurate. But as we see in the cancer problem, people with cancer give the same answer as people without.

Then you'd agree that with full memory, the patient would have something to update on?

Yes, but then we wouldn't be talking about her credence on an awakening. We'd be talking about her credence on first waking and second waking. We'd treat them separately. With amnesia, 2 wakings are the same as 1. It's really just one experience.

Comment author: cupholder 14 May 2010 05:29:40PM 1 point [-]

Basically, the 2 wakings on tails should be thought of as one waking. You're just counting the same thing twice.

This is true if we want the ratio of tails to wakings. However...

When you include counts of variables that have a correlation of 1 in your denominator, it's not clear what you are getting back. The thirders are using a relative frequency that doesn't converge to a probability

Despite the perfect correlation between some of the variables, one can still get a probability back out - but it won't be the probability one expects.

Maybe one day I decide I want to know the probability that a randomly selected household on my street has a TV. I print up a bunch of surveys and put them in people's mailboxes. However, it turns out that because I am very absent-minded (and unlucky), I accidentally put two surveys in the mailboxes of people with a TV, and only one in the mailboxes of people without TVs. My neighbors, because they enjoy filling out surveys so much, dutifully fill out every survey and send them all back to me. Now the proportion of surveys that say 'yes, I have a TV' is not the probability I expected (the probability of a household having a TV) - but it is nonetheless a probability, just a different one (the probability of any given survey saying, 'I have a TV').

Comment author: neq1 14 May 2010 05:34:56PM 1 point [-]

That's a good example. There is a big difference though (it's subtle). With sleeping beauty, the question is about her probability at a waking. At a waking, there are no duplicate surveys. The duplicates occur at the end.

View more: Prev | Next