Vladimir_M comments on Open Thread: July 2010 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (653)
Here's a puzzle I've been trying to figure out. It involves observation selection effects and agreeing to disagree. It is related to a paper I am writing, so help would be appreciated. The puzzle is also interesting in itself.
Charlie tosses a fair coin to determine how to stock a pond. If heads, it gets 3/4 big fish and 1/4 small fish. If tails, the other way around. After Charlie does this, he calls Al into his office. He tells him, "Infinitely many scientists are curious about the proportion of fish in this pond. They are all good Bayesians with the same prior. They are going to randomly sample 100 fish (with replacement) each and record how many of them are big and how many are small. Since so many will sample the pond, we can be sure that for any n between 0 and 100, some scientist will observe that n of his 100 fish were big. I'm going to take the first one that sees 25 big and team him up with you, so you can compare notes." (I don't think it matters much whether infinitely many scientists do this or just 3^^^3.)
Okay. So Al goes and does his sample. He pulls out 75 big fish and becomes nearly certain that 3/4 of the fish are big. Afterwards, a guy named Bob comes to him and tells him he was sent by Charlie. Bob says he randomly sampled 100 fish, 25 of which were big. They exchange ALL of their information.
Question: How confident should each of them be that 3/4 of the fish are big?
Natural answer: Charlie should remain nearly certain that ¾ of the fish are big. He knew in advance that someone like Bob was certain to talk to him regardless of what proportion of fish were big. So he shouldn't be the least bit impressed after talking to Bob.
But what about Bob? What should he think? At first glance, you might think he should be 50/50, since 50% of the fish he knows about have been big and his access to Al's observations wasn't subject to a selection effect. But that can't be right, because then he would just be agreeing to disagree with Al! (This would be especially puzzling, since they have ALL the same information, having shared everything.) So maybe Bob should just agree with Al: he should be nearly certain that ¾ of the fish are big.
But that's a bit odd. It isn't terribly clear why Bob should discount all of his observations, since they don't seem to subject to any observation selection effect; at least from his perspective, his observations were a genuine random sample.
Things get weirder if we consider a variant of the case.
VARIANT: as before, but Charlie has a similar conversation with Bob. Only this time, he tells him he's going to introduce Bob to someone who observed exactly 75 of 100 fish to be big.
New Question: Now what should Bob and Al think?
Here, things get really weird. By the reasoning that led to the Natural Answer above, Al should be nearly certain that ¾ are big and Bob should be nearly certain that ¼ are big. But that can't be right. They would just be agreeing to disagree! (Which would be especially puzzling, since they have ALL the same information.) The idea that they should favor one hypothesis in particular is also disconcerting, given the symmetry of the case. Should they both be 50/50?
Here's where I'd especially appreciate enlightenment: 1.If Bob should defer to Al in the original case, why? Can someone walk me through the calculations that lead to this?
2.If Bob should not defer to Al in the original case, is that because Al should change his mind? If so, what is wrong with the reasoning in the Natural Answer? If not, how can they agree to disagree?
3.If Bob should defer to Al in the original case, why not in the symmetrical variant?
4.What credence should they have in the symmetrical variant?
5.Can anyone refer me to some info on observation selection effects that could be applied here?
First, let's calculate the concrete probability numbers. If we are to trust this calculator, the probability of finding exactly 75 big fish in a sample of a hundred from a pond where 75% of the fish are big is approximately 0.09, while getting the same number in a sample from a 25% big pond has a probability on the order of 10^-25. The same numbers hold in the reverse situation, of course.
Now, Al and Bob have to consider two possible scenarios:
The fish are 75% big, Al got the decently probable 75/100 sample, but Bob happened to be the first scientist who happened to get the extremely improbable 25/100 sample, and there were likely 10^(twenty-something) or so scientists sampling before Bob.
The fish are 25% big, Al got the extremely improbable 75/100 big sample, while Bob got the decently probable 25/100 sample. This means that Bob is probably among the first few scientists who have sampled the pond.
So, let's look at it from a frequentist perspective: if we repeat this game many times, what will be the proportion of occurrences in which each scenario takes place?
Here we need an additional critical piece of information: how exactly was Bob's place in the sequence of scientists determined? At this point, an infinite number of scientists will give us lots of headache, so let's assume it's some large finite number N_sci, and Bob's place in the sequence is determined by a random draw with probabilities uniformly distributed over all places in the sequence. And here we get an important intermediate result: assuming that at least one scientist gets to sample 25/100, the probability for Bob to be the first to sample 25/100 is independent of the actual composition of the pond! Think of it by means of a card-drawing analogy. If you're in a group of 52 people whose names are repeatedly called out in random order to draw from a deck of cards, the proportion of drawings in which you get to be the first one to draw the ace of spades will always be 1/52, regardless of whether it's a normal deck or a non-standard one with multiple aces of spades, or even a deck of 52 such aces!
Now compute the following probabilities:
P1 = p(75% big fish) * p(Al samples 75/100 | 75% big fish) * p(Bob gets to be the first to sample 25/100)
~ 0.5 * 0.09 * 1/N_sci
P2 = p(25% big fish) * p(Al samples 75/100 | 25% big fish) *p (Bob gets to be the first to sample 25/100)
~ 0.5 * 10^-25 * 1/N_sci
(We ignore the finite, but presumably negligible probabilities that no scientist samples 25/100 in either case; these can be made arbitrarily low by increasing N_sci.)
Therefore, we have P1 >> P2, i.e. the overwhelming majority of meetings between Al and Bob -- which are by themselves extremely rare, since Al usually meets someone from the other (N_sci-1) scientists -- happen under the first scenario, where Al gets a sample closely matching the actual ratio.
Now, you say:
Not really, when you consider repeating the experiment. For the overwhelming majority of repetitions, Bob will get results close to the actual ratio, and on rare occasions he'll get extreme outlier samples. Those repetitions in which he gets summoned to meet with Al, however, are not a representative sample of his measurements! The criteria for when he gets to meet with Al are biased towards including a greater proportion of his improbable 25/100 outlier results.
As for this:
I don't think this is a well defined scenario. Answers will depend on the exact process by which this second observer gets selected. (Just like in the preceding discussion, the answer would be different if e.g. Bob had been always assigned the same place in the sequence of scientists.)
I was assuming Charlie would show Bob the first person to see 75/100.
Anyway, your analysis solves this as well. Being the first to see a particular result tells you essentially nothing about the composition of the pond (provided N_sci is sufficiently large that someone or other was nearly certain to see the result). Thus, each of Al and Bob should regard their previous observations as irrelevant once they learn that they were the first to get those results. Thus, they should just stick with their priors and be 50/50 about the composition of the pond.