show me the specific mistake in my math, rather than appeal to a verbal presentation of a non-formal, intuitive explanation.
My point was that I didn't think anything was wrong with your math. If you count tokens the answer you get is 1/3. If you count types the answer you get is 1/2 (did you need more math for that?). Similarly, you can design payouts where the right choice is 1/3 and payouts where the right choice is 1/2.
You can compute things like the expected number of green marbles left in the bag. In the bag problem, IOW, we are quantifying our uncertainty over tokens, while taking types to be a fixed feature of the situation.
b) what justifies "worrying about types rather than tokens" in this situation, where every other discussion of probability "worries about tokens" in the sense I've outlined above in reference to the bag of marbles?
This was a helpful comment for me. What we're dealing with is actually a special case of the type-token ambiguity: the tokens are actually indistinguishable. Say I flip a coin. I, If tails I put six red marbles in a bag which already contains three red marbles bag, if heads do nothing to the bag with three red marbles. I draw a marble and tell Beauty "red". And then I ask Beauty her credence for the coin landing heads. I think that is basically isomorphic to the Sleep Beauty problem. In the original she is woken up twice if heads, but thats just like having more red marbles to choose from, the experiences are indistinguishable just like the marbles.
Statements like "information is gained" or "information is lost" are vague and imprecise,
I don't really think they are. That's my major problem with the 1/3 answer. No one has ever shown me the unexpected experience Beauty must have to update from 0.5. But if you feel that way I'll try other methods.
c) how do you apply the type-token distinction in other problems, say, in the case of the Tuesday Boy?
Off hand there is no reason to worry about types, as the possible answers to the questions "Do you have exactly two children?" and "Is one of them a boy born on a Tuesday?" are all distinguishable. But I haven't thought really hard about that problem, maybe there is something I'm missing. My approach does suggest a reason for why the Self-Indication Assumption is wrong: the necessary features of an observer are indistinguishable. So it returns 0.5 for the Presumptuous Philosopher problem.
I'll come back with an answer to (a). Bug me about it if I don't. There is admittedly a problem which I haven't worked out: I'm not sure how to relate the experience-type to the day of the week (time is a property of tokens). Basically, the type by itself doesn't seem to tell us anything about the day (just like picking the red marble doesn't tell us whether or not it was added after the coin flip. And maybe that's a reason to reject my approach. I don't know.
I was recently disturbed by my perception that, despite years of studying and debating probability problems, the LessWrong community as a whole has not markedly improved its ability to get the right answer on them.
I had expected that people would read posts and comments by other people, and take special note of comments by people who had a prior history of being right, and thereby improve their own accuracy.
But can that possibly work? How can someone who isn't already highly-accurate, identify other people who are highly accurate?
Aumann's agreement theorem (allegedly) says that Bayesians with the same priors agree. But it doesn't say that doing so helps. Under what circumstances does revising your opinions, by updating in response to people you consider reliable, actually improve your accuracy?
To find out, I built a model of updating in response to the opinions of others. It did, eventually, show that Bayesians improve their collective opinions by updating in response to the opinions of other Bayesians. But this turns out not to depend on them satisfying the conditions of Aumann's theorem, or on doing Bayesian updating. It depends only on a very simple condition, established at the start of the simulation. Can you guess what it is?
I'll write another post describing and explaining the results if this post receives a karma score over 10.
That's getting a bit ahead of ourselves, though. This post models only non-Bayesians, and the results are very different.
Here's the model:
Algorithm:
# Loop over T timesteps
For t = 0 to T-1 {
# Loop over G people
For i = 0 to G-1 {
# Loop over N problems
For v = 0 to N-1 {
If (t == 0)
# Special initialization for the first timestep
If (random in [0..1] < pi) givt := 1; Else givt := 0
Else {
# Product over all j of the probability that the answer to v is 1 given j's answer and estimated accuracy
m1 := ∏j [ pijgjv(t-1) + (1-pij)(1-gjv(t-1)) ]
# Product over all j of the probability that the answer to v is 0 given j's answer and estimated accuracy
m0 := ∏j [ pij(1-gjv(t-1)) + (1-pij)gjv(t-1) ]
p1 := m1 / (m0 + m1) # Normalize
If (p1 > .5) givt := 1; Else givt := 0
}
}
# Loop over G other people
For j = 0 to G-1
# Compute person i's estimate of person j's accuracy
pij := { Σs in [0 .. t] Σv in [s..N] [ givtgjvs + (1-givt)(1-gjvs) ] } / N
}
}
p1 is the probability that agent i assigns to problem v having the answer 1. Each term pijgjv(t-1) + (1-pij)(1-gjv(t-1)) is the probability of problem v having answer 1 computed using agent j's beliefs, by adding either the probability that j is correct (if j believes it has answer 1), or the probability that j is wrong (if j believes it has answer 0). Agent i assumes that everyone's opinions are independent, and multiplies all these probabilities together. The result, m1, is very small when there are very many agents (m1 is on the order of .5G), so it is normalized by computing a similar product m0 for the probability that v has answer 0, and setting p1 = m1 / (m0 + m1).
The sum of sums to compute pij (i's opinion of j's accuracy) computes the fraction of problems, summed over all previous time periods, on which person j has agreed with person i's current opinions. It sums over previous time periods because otherwise, pii = 1. By summing over previous times, if person i ever changes its mind, that will decrease pii. (The inner sum starts from s instead of 0 to accomodate an addition to the model that I'll make later, in which the true answer to problem t is revealed at the end of time t. Problems whose answer is public knowledge should not be considered in the sum after the time they became public knowledge.)
Now, what distribution should we use for the pi?
There is an infinite supply of problems. Many are so simple that everyone gets them right; many are so hard or incomprehensible that everyone performs randomly on them; and there are many, such as the Monty Haul problem, that most people get wrong because of systematic bias in our thinking. The range of population average performance pave on all possible problems thus falls within [0 .. 1].
I chose to model person accuracy instead of problem difficulty. I say "instead of", because you can use either person accuracy or problem difficulty to set pave. Since a critical part of what we're modeling is person i's estimate of person j's accuracy, person j should actually have an accuracy. I didn't model problem difficulty partly because I assume we only talk about problems of a particular level of difficulty; partly because a person in this model can't distinguish between "Most people disagree with me on this problem; therefore it is difficult" and "Most people disagree with me on this problem; therefore I was wrong about this problem".
Because I assume we talk mainly about high-entropy problems, I set pave = .5. I do this by drawing pi from [0 .. 1], with a normal distribution with a mean of .5, truncated at .05 and .95. (I used a standard deviation of .15; this isn't important.)
Because this distribution of pi is symmetric around .5, there is no way to know whether you're living in the world where the right answer is always 1, or where the right answer is always 0. This means there's no way, under this model, for a person to know whether they're a crackpot (usually wrong) or a genius (usually right).
Note that these agents don't satisfy the preconditions for Aumann agreement, because they produce 0/1 decisions instead of probabilities, and because some agents are biased to perform worse than random. It's worth studying non-Bayesian agents before moving on to a model satisfying the preconditions for the theorem, if only because there are so many of them in the real world.
An important property of this model is that, if person i is highly accurate, and knows it, pii will approach 1, greatly reducing the chance that person i will change their mind about any problem. Thus, the more accurate a person becomes, the less able they are to change their minds when they are wrong - and this is not an error. It's a natural limit on the speed at which one can converge on truth.
An obvious problem is that at t=0, person i will see that it always agrees with itself, and set pii = 1. By induction, no one will ever change their mind. (I consider this evidence for the model, rather than against it.)
The question of how people ever change their mind is key to this whole study. I use one of these two additions to the model to let people change their mind:
This model is difficult to solve analytically, so I wrote a Perl script to simulate it.