Updating, part 1: When can you change your mind? The binary model

PhilGoetz

14 Updating, part 1: When can you change your mind? The binary model

13th May 2010

5 min read

14

I was recently disturbed by my perception that, despite years of studying and debating probability problems, the LessWrong community as a whole has not markedly improved its ability to get the right answer on them.

I had expected that people would read posts and comments by other people, and take special note of comments by people who had a prior history of being right, and thereby improve their own accuracy.

But can that possibly work? How can someone who isn't already highly-accurate, identify other people who are highly accurate?

Aumann's agreement theorem (allegedly) says that Bayesians with the same priors agree. But it doesn't say that doing so helps. Under what circumstances does revising your opinions, by updating in response to people you consider reliable, actually improve your accuracy?

To find out, I built a model of updating in response to the opinions of others. It did, eventually, show that Bayesians improve their collective opinions by updating in response to the opinions of other Bayesians. But this turns out not to depend on them satisfying the conditions of Aumann's theorem, or on doing Bayesian updating. It depends only on a very simple condition, established at the start of the simulation. Can you guess what it is?

I'll write another post describing and explaining the results if this post receives a karma score over 10.

That's getting a bit ahead of ourselves, though. This post models only non-Bayesians, and the results are very different.

Here's the model:

There are G people in a group such as LessWrong.
There are N problems being discussed simultaneously.
Problems are binary problems, with an answer of either 1 or 0.
Each person's opinion on each problem is always known to all people.
Each person i has an accuracy: Their probability p_i of getting any arbitrary problem correct on the first guess.
g_ivt is what person i believes at time t is the answer to problem v (1 or 0).
p_ij expresses person i's estimate of the probability that an arbitrary belief of person j is correct.
Without loss of generality, assume the correct answer to every problem is 1.

Algorithm:

# Loop over T timesteps
For t = 0 to T-1 {

# Loop over G people
For i = 0 to G-1 {

# Loop over N problems
For v = 0 to N-1 {

If (t == 0)

# Special initialization for the first timestep
If (random in [0..1] < p_i) g_ivt := 1; Else g_ivt := 0

Else {

# Product over all j of the probability that the answer to v is 1 given j's answer and estimated accuracy
m1 := ∏_j [ p_ijg_jv(t-1) + (1-p_ij)(1-g_jv(t-1)) ]

# Product over all j of the probability that the answer to v is 0 given j's answer and estimated accuracy
m0 := ∏_j [ p_ij(1-g_jv(t-1)) + (1-p_ij)g_jv(t-1) ]

p1 := m1 / (m0 + m1) # Normalize

If (p1 > .5) g_ivt := 1; Else g_ivt := 0

}

# Loop over G other people
For j = 0 to G-1

# Compute person i's estimate of person j's accuracy
p_ij := { Σ_{s in [0 .. t]} Σ_{v in [s..N]} [ g_ivtg_jvs + (1-g_ivt)(1-g_jvs) ] } / N

}

p1 is the probability that agent i assigns to problem v having the answer 1. Each term p_ijg_jv(t-1) + (1-p_ij)(1-g_jv(t-1)) is the probability of problem v having answer 1 computed using agent j's beliefs, by adding either the probability that j is correct (if j believes it has answer 1), or the probability that j is wrong (if j believes it has answer 0). Agent i assumes that everyone's opinions are independent, and multiplies all these probabilities together. The result, m1, is very small when there are very many agents (m1 is on the order of .5^G), so it is normalized by computing a similar product m0 for the probability that v has answer 0, and setting p1 = m1 / (m0 + m1).

The sum of sums to compute p_ij (i's opinion of j's accuracy) computes the fraction of problems, summed over all previous time periods, on which person j has agreed with person i's current opinions. It sums over previous time periods because otherwise, p_ii = 1. By summing over previous times, if person i ever changes its mind, that will decrease p_ii. (The inner sum starts from s instead of 0 to accomodate an addition to the model that I'll make later, in which the true answer to problem t is revealed at the end of time t. Problems whose answer is public knowledge should not be considered in the sum after the time they became public knowledge.)

Now, what distribution should we use for the p_i?

There is an infinite supply of problems. Many are so simple that everyone gets them right; many are so hard or incomprehensible that everyone performs randomly on them; and there are many, such as the Monty Haul problem, that most people get wrong because of systematic bias in our thinking. The range of population average performance p_ave on all possible problems thus falls within [0 .. 1].

I chose to model person accuracy instead of problem difficulty. I say "instead of", because you can use either person accuracy or problem difficulty to set p_ave. Since a critical part of what we're modeling is person i's estimate of person j's accuracy, person j should actually have an accuracy. I didn't model problem difficulty partly because I assume we only talk about problems of a particular level of difficulty; partly because a person in this model can't distinguish between "Most people disagree with me on this problem; therefore it is difficult" and "Most people disagree with me on this problem; therefore I was wrong about this problem".

Because I assume we talk mainly about high-entropy problems, I set p_ave = .5. I do this by drawing p_i from [0 .. 1], with a normal distribution with a mean of .5, truncated at .05 and .95. (I used a standard deviation of .15; this isn't important.)

Because this distribution of p_i is symmetric around .5, there is no way to know whether you're living in the world where the right answer is always 1, or where the right answer is always 0. This means there's no way, under this model, for a person to know whether they're a crackpot (usually wrong) or a genius (usually right).

Note that these agents don't satisfy the preconditions for Aumann agreement, because they produce 0/1 decisions instead of probabilities, and because some agents are biased to perform worse than random. It's worth studying non-Bayesian agents before moving on to a model satisfying the preconditions for the theorem, if only because there are so many of them in the real world.

An important property of this model is that, if person i is highly accurate, and knows it, p_ii will approach 1, greatly reducing the chance that person i will change their mind about any problem. Thus, the more accurate a person becomes, the less able they are to change their minds when they are wrong - and this is not an error. It's a natural limit on the speed at which one can converge on truth.

An obvious problem is that at t=0, person i will see that it always agrees with itself, and set p_ii = 1. By induction, no one will ever change their mind. (I consider this evidence for the model, rather than against it.)

The question of how people ever change their mind is key to this whole study. I use one of these two additions to the model to let people change their mind:

At the end of each timestep t, the answer to problem number t becomes mutual knowledge to the entire group. (This solves the crackpot/genius problem.)
Each person has a maximum allowable p_ij (including p_ii).

This model is difficult to solve analytically, so I wrote a Perl script to simulate it.

What do you think will happen when I run the program, or its variants?
What other variants would you like to see tested?
Is there a fundamental problem with the model?

Personal Blog

14

New Comment

Rendering 0/156 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 2:40 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

14 Updating, part 1: When can you change your mind? The binary model

by PhilGoetz

13th May 2010

5 min read

156

14

I had expected that people would read posts and comments by other people, and take special note of comments by people who had a prior history of being right, and thereby improve their own accuracy.

But can that possibly work? How can someone who isn't already highly-accurate, identify other people who are highly accurate?

I'll write another post describing and explaining the results if this post receives a karma score over 10.

That's getting a bit ahead of ourselves, though. This post models only non-Bayesians, and the results are very different.

Here's the model:

There are G people in a group such as LessWrong.
There are N problems being discussed simultaneously.
Problems are binary problems, with an answer of either 1 or 0.
Each person's opinion on each problem is always known to all people.
Each person i has an accuracy: Their probability p_i of getting any arbitrary problem correct on the first guess.
g_ivt is what person i believes at time t is the answer to problem v (1 or 0).
p_ij expresses person i's estimate of the probability that an arbitrary belief of person j is correct.
Without loss of generality, assume the correct answer to every problem is 1.

Algorithm:

# Loop over T timesteps
For t = 0 to T-1 {

# Loop over G people
For i = 0 to G-1 {

# Loop over N problems
For v = 0 to N-1 {

If (t == 0)

# Special initialization for the first timestep
If (random in [0..1] < p_i) g_ivt := 1; Else g_ivt := 0

Else {

# Product over all j of the probability that the answer to v is 1 given j's answer and estimated accuracy
m1 := ∏_j [ p_ijg_jv(t-1) + (1-p_ij)(1-g_jv(t-1)) ]

# Product over all j of the probability that the answer to v is 0 given j's answer and estimated accuracy
m0 := ∏_j [ p_ij(1-g_jv(t-1)) + (1-p_ij)g_jv(t-1) ]

p1 := m1 / (m0 + m1) # Normalize

If (p1 > .5) g_ivt := 1; Else g_ivt := 0

}

# Loop over G other people
For j = 0 to G-1

# Compute person i's estimate of person j's accuracy
p_ij := { Σ_{s in [0 .. t]} Σ_{v in [s..N]} [ g_ivtg_jvs + (1-g_ivt)(1-g_jvs) ] } / N

}

Now, what distribution should we use for the p_i?

The question of how people ever change their mind is key to this whole study. I use one of these two additions to the model to let people change their mind:

At the end of each timestep t, the answer to problem number t becomes mutual knowledge to the entire group. (This solves the crackpot/genius problem.)
Each person has a maximum allowable p_ij (including p_ii).

This model is difficult to solve analytically, so I wrote a Perl script to simulate it.

What do you think will happen when I run the program, or its variants?
What other variants would you like to see tested?
Is there a fundamental problem with the model?

Personal Blog

14

New Comment

Rendering 0/156 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 2:40 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

More from PhilGoetz

Curated and popular this week

156Comments

156

Comment Permalink

Morendil16y20

Unlike Jack, I'm pessimistic about your proposal. I've already changed my mind not once but twice.

The interesting aspect is that this doesn't feel like I'm vacillating. I have gone from relying on a vague and unreliable intuition in favor of 1/3 qualified with "it depends", to being moderately certain that 1/2 was unambiguously correct, to having worked out how I was allocating all of the probability mass in the original problem and getting back 1/3 as the answer that I cannot help but think is correct. That, plus the meta-observation that no-one, including people I've asked directly (including yourself), has a rebuttal to my construction of the table, is leaving me with a higher degree of confidence than I previously had in 1/3.

It now feels as if I'm justified to ignore pretty much any argument which is "merely" a verbal appeal to one intuition or the other. Either my formalization corresponds to the problem as verbally stated or it doesn't; either my math is correct or it isn't. "Here I stand, I can no other" - at least until someone shows me my mistake.

Showing 3 of 4 replies (Click to show all)

timtyler16y00

Congratulations on getting to that point, I figure.

7Jack16y

So I think I figured this whole thing out. Are people familiar with the type-token distinction and resulting ambiguities? If I have five copies of the book Catcher in the Rye and you ask me how many books I have there is an ambiguity. I could say one or five. One refers to the type, "Catcher in the Rye is a coming of age novel" is a sentence about the type. Five refers to the number of tokens, "I tossed Catcher in the Rye onto the bookshelf" is a sentence about the token. The distinction is ubiquitous and leads to occasional confusion, enough that the subject is at the top of my Less Wrong to-do list. The type token distinction becomes an issue whenever we introduce identical copies and the distinction dominates my views on personal identity. In the Sleeping Beauty case, the amnesia means the experience of waking up on Monday and the experience of waking up on Tuesday, while token-distinct are type-identical. If we decide the right thing to update on isn't the token experience but the type experience: well the calculations are really easy. The type experience "waking up" has P=1 for heads and tails. So the prior never changes. I think there are some really good reasons for worrying about types rather than tokens in this context but won't go into until I make sure the above makes sense to someone.

1neq116y

Morendil, This is strange. It sounds like you have been making progress towards settling on an answer, after discussion with others. That would suggest to me that discussion can move us towards consensus. I like your approach a lot. It's the first time I've seen the thirder argument defended with actually probability statements. Personally, I think there shouldn't be any probability mass on 'not woken', but that is something worth thinking about and discussing. One thing that I think is odd. Thirders know she has nothing to update on when she is woken, because they admit she will give the same answer, regardless of if it's heads or tails. If she really had new information that is correlated with the outcome, her credence would move towards heads when heads, and tails when tails. Consider my cancer intuition pump example. Everyone starts out thinking there is a 50% chance they have cancer. Once woken, regarldess of if they have cancer or not, they all shift to 90%. Did they really learn anything about their disease state by being woken? If they did, those with cancer would have shifted their credence up a bit, and those without would have shifted down. That's what updating is.

See in context