JGWeissman comments on Open Thread: July 2010 - Less Wrong

6 Post author: komponisto 01 July 2010 09:20PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (653)

You are viewing a single comment's thread. Show more comments above.

Comment author: utilitymonster 03 July 2010 05:28:47PM *  8 points [-]

Here's a puzzle I've been trying to figure out. It involves observation selection effects and agreeing to disagree. It is related to a paper I am writing, so help would be appreciated. The puzzle is also interesting in itself.

Charlie tosses a fair coin to determine how to stock a pond. If heads, it gets 3/4 big fish and 1/4 small fish. If tails, the other way around. After Charlie does this, he calls Al into his office. He tells him, "Infinitely many scientists are curious about the proportion of fish in this pond. They are all good Bayesians with the same prior. They are going to randomly sample 100 fish (with replacement) each and record how many of them are big and how many are small. Since so many will sample the pond, we can be sure that for any n between 0 and 100, some scientist will observe that n of his 100 fish were big. I'm going to take the first one that sees 25 big and team him up with you, so you can compare notes." (I don't think it matters much whether infinitely many scientists do this or just 3^^^3.)

Okay. So Al goes and does his sample. He pulls out 75 big fish and becomes nearly certain that 3/4 of the fish are big. Afterwards, a guy named Bob comes to him and tells him he was sent by Charlie. Bob says he randomly sampled 100 fish, 25 of which were big. They exchange ALL of their information.

Question: How confident should each of them be that 3/4 of the fish are big?

Natural answer: Charlie should remain nearly certain that ¾ of the fish are big. He knew in advance that someone like Bob was certain to talk to him regardless of what proportion of fish were big. So he shouldn't be the least bit impressed after talking to Bob.

But what about Bob? What should he think? At first glance, you might think he should be 50/50, since 50% of the fish he knows about have been big and his access to Al's observations wasn't subject to a selection effect. But that can't be right, because then he would just be agreeing to disagree with Al! (This would be especially puzzling, since they have ALL the same information, having shared everything.) So maybe Bob should just agree with Al: he should be nearly certain that ¾ of the fish are big.

But that's a bit odd. It isn't terribly clear why Bob should discount all of his observations, since they don't seem to subject to any observation selection effect; at least from his perspective, his observations were a genuine random sample.

Things get weirder if we consider a variant of the case.

VARIANT: as before, but Charlie has a similar conversation with Bob. Only this time, he tells him he's going to introduce Bob to someone who observed exactly 75 of 100 fish to be big.

New Question: Now what should Bob and Al think?

Here, things get really weird. By the reasoning that led to the Natural Answer above, Al should be nearly certain that ¾ are big and Bob should be nearly certain that ¼ are big. But that can't be right. They would just be agreeing to disagree! (Which would be especially puzzling, since they have ALL the same information.) The idea that they should favor one hypothesis in particular is also disconcerting, given the symmetry of the case. Should they both be 50/50?

Here's where I'd especially appreciate enlightenment: 1.If Bob should defer to Al in the original case, why? Can someone walk me through the calculations that lead to this?

2.If Bob should not defer to Al in the original case, is that because Al should change his mind? If so, what is wrong with the reasoning in the Natural Answer? If not, how can they agree to disagree?

3.If Bob should defer to Al in the original case, why not in the symmetrical variant?

4.What credence should they have in the symmetrical variant?

5.Can anyone refer me to some info on observation selection effects that could be applied here?

Comment author: JGWeissman 03 July 2010 06:22:49PM 1 point [-]

From Bob's perspective, he was more likely to be chosen as the one to talk to Al, if there are fewer scientist that observed exactly 25 big fish, which would happen if there are more big fish. So Bob should update on the evidence of being chosen.

Comment author: utilitymonster 03 July 2010 07:45:24PM *  0 points [-]

This should be important to the finite case. The probability of being the first to see 25/100 is WAY higher (x 10^25 or so) if the lake is 3/4 full of big fish than if it is 1/4 full of big fish.

But in the infinite case the probability of being first is 0 either way...

Comment author: JGWeissman 03 July 2010 08:51:42PM 2 points [-]

There is a reason we consider infinities only as limits of sequences of finite quantities.

Suppose you tried to sum the log-odds evidence of the infinite scientist that the pond has more big fish. Well, some of them have positive evidence (summing to positive infinity), some have negative evidence (summing to negative infinity), and you can, by choosing the order of summation, get any result you want (up to some granularity) between negative and positive infinity.

You don't need anthropomorphic tricks to make things weird if you have actual infinities in the problem.

Comment author: Vladimir_M 04 July 2010 04:53:46AM *  1 point [-]

utilitymonster:

The probability of being the first to see 25/100 is WAY higher (x 10^25 or so) if the lake is 3/4 full of big fish than if it is 1/4 full of big fish.

Maybe I'm misunderstanding your phrasing here, but it sounds fallacious. If there's a deck of cards and you're in a group of 52 people who are called out in random order and told to pick one card each from the deck, the probability of being the first person to draw an ace is exactly the same (1/52) regardless of whether it's a normal deck or a deck of 52 aces (or even a deck with 3 out of 4 aces replaced by other cards). This result doesn't even depend on whether the card is removed or returned into the deck after each person's drawing; the conclusion follows purely from symmetry. The only special case is when there are zero aces, in which the event becomes impossible, with p=0.

Similarly, if the order in which the scientists get their samples is shuffled randomly, and we ignore the improbable possibility that nobody sees 25/100, then purely by symmetry, the probability that Bob happens to be the first one to see 25/100 is the same regardless of the actual frequency of the 25/100 results: p = 1/N(scientists).

Comment author: utilitymonster 04 July 2010 11:47:04AM *  1 point [-]

You're right, thanks.

I was considering an example with 10^100 scientists. I thought that since there would be a lot more scientists who got 25 big in the 1/4 scenario than in the 3/4 scenario (about 9.18 * 10^98 vs. 1.279 * 10^75), you'd be more likely to be first the 3/4 scenario. But this forgets about the probability of getting an improbable result.

In general, if there are N scientists, and the probability of getting some result is p, then we can expect Np scientists to get that result on average. If the order is shuffled as you suggest, then the probability of being the first to get that result is p * 1/(Np) = 1/N. So the probability of being the first to get the result is the same, regardless of the likelihood of the result (assuming someone will get the result).

EDIT: It occurs to me that I might have been thinking about the probability of being selected by Al conditional on getting 25/100. In that case, you're a lot more likely to be selected if the pond is 3/4 big than if it is 1/4 big, since WAY more people got similar results in the latter case. JGMWeissman was probably thinking the same.