Updating, part 1: When can you change your mind? The binary model
I was recently disturbed by my perception that, despite years of studying and debating probability problems, the LessWrong community as a whole has not markedly improved its ability to get the right answer on them.
I had expected that people would read posts and comments by other people, and take special note of comments by people who had a prior history of being right, and thereby improve their own accuracy.
But can that possibly work? How can someone who isn't already highly-accurate, identify other people who are highly accurate?
Aumann's agreement theorem (allegedly) says that Bayesians with the same priors agree. But it doesn't say that doing so helps. Under what circumstances does revising your opinions, by updating in response to people you consider reliable, actually improve your accuracy?
To find out, I built a model of updating in response to the opinions of others. It did, eventually, show that Bayesians improve their collective opinions by updating in response to the opinions of other Bayesians. But this turns out not to depend on them satisfying the conditions of Aumann's theorem, or on doing Bayesian updating. It depends only on a very simple condition, established at the start of the simulation. Can you guess what it is?
I'll write another post describing and explaining the results if this post receives a karma score over 10.
SIA won't doom you
Katja Grace has just presented an ingenious model, claiming that SIA combined with the great filter generates its own variant of the doomsday argument. Robin echoed this on Overcoming Bias. We met soon after Katja had come up with the model, and I signed up to it, saying that I could see no flaw in the argument.
Unfortunately, I erred. The argument does not work in the form presented.
First of all, there is the issue of time dependence. We are not just a human level civilization drifting through the void in blissful ignorance about our position in the universe. We know (approximately) the age of our galaxy, and the time elapsed since the big bang.
How is this relevant? It is relevant because all arguments about the great filter are time-dependent. Imagine we had just reached consciousness and human-level civilization, by some fluke, two thousand years after the creation of our galaxy, by an evolutionary process that took two thousand years. We see no aliens around us. In this situation, we have no reason to suspect any great filter; if we asked ourselves "are we likely to be the first civilization to reach this stage?" then the answer is probably yes. No evidence for a filter.
Imagine, instead, that we had reached consciousness a trillion years into the life of our galaxy, again via an evolutionary process that took two thousand years, and we see no aliens or traces of aliens. Then the evidence for a filter is overwhelming; something must have stopped all those previous likely civilizations from emerging into the galactic plane.
So neither of these civilizations can be included in our reference class (indeed, the second one can only exist if we ourselves are filtered!). So the correct reference class to use is not "the class of all potential civilizations in our galaxy that have reached our level of technological advancement and seen no aliens", but "the class of all potential civilizations in our galaxy that have reached our level of technological advancement at around the same time as us and seen no aliens". Indeed, SIA, once we update on the present, cannot tell us anything about the future.
But there's more.
Information theory and the symmetry of updating beliefs
Contents:
1. The beautiful symmetry of Bayesian updating
2. Odds and log odds: a short comparison
3. Further discussion of information
Rationality is all about handling this thing called "information". Fortunately, we live in an era after the rigorous formulation of Information Theory by C.E. Shannon in 1948, a basic understanding of which can actually help you think about your beliefs, in a way similar but complementary to probability theory. Indeed, it has flourished as an area of research exactly because it helps people in many areas of science to describe the world. We should take advantage of this!
The information theory of events, which I'm about to explain, is about as difficult as high school probability. It is certainly easier than the information theory of multiple random variables (which right now is explained on Wikipedia), even though the equations look very similar. If you already know it, this can be a linkable source of explanations to save you writing time :)
So! To get started, what better way to motivate information theory than to answer a question about Bayesianism?
The beautiful symmetry of Bayesian updating
The factor by which observing A increases the probability of B is the same as the factor by which observing B increases the probability of A. This factor is P(A and B)/(P(A)·P(B)), which I'll denote by pev(A,B) for reasons to come. It can vary from 0 to +infinity, and allows us to write Bayes' Theorem succinctly in both directions:
P(A|B)=P(A)·pev(A,B), and P(B|A)=P(B)·pev(A,B)
What does this symmetry mean, and how should it affect the way we think?
A great way to think of pev(A,B) is as a multiplicative measure of mutual evidence, which I'll call mutual probabilistic evidence to be specific. If pev=1 if they're independent, if pev>1 they make each other more likely, and if pev<1 if they make each other less likely.
But two ways to think are better than one, so I will offer a second explanation, in terms of information, which I often find quite helpful in analyzing my own beliefs:
Extreme updating: The devil is in the missing details
Today Ed Yong has a post on Not Exactly Rocket Science that is about updating - actually, the most extreme case in updating, where a person gets to choose between relying completely on their own judgement, or completely on the judgement of others. He describes 2 experiments by Daniel Gilbert of Harvard in which subjects are given information about experience X, and asked to predict how they would feel (on a linear scale) on experiencing X; they then experience X and rate what they felt on that linear scale.
In both cases, the correlation between post-experience judgements of different subjects is much higher than the correlation between the prediction and the post-experience judgement of each subject. This isn't surprising - the experiments are designed so that the experience provides much more information than the given pre-experience information does.
What might be surprising is that the subjects believe the opposite: that they can predict their response from information better than from the responses of others.
Whether these experiments are interesting depends on how the subjects were asked the question. If they were asked, before being given information or being told what that information would be, whether they could predict their response to an experience better by making their own judgement based on information, or from the responses of others, then the result is not interesting. The subjects in that case did not know that they would be given only a trivial amount of information relative to those who had the experience.
The result is only interesting if the subjects were given the information first, and then asked whether they could predict their response better from that information than from someone else's experience. Yong's post doesn't say which of these things happened, and doesn't cite the original article, so I can't look it up. Does anyone know?
I've heard studies like this cited as strong evidence that we should update more; but never heard that critical detail given for any such studies. Are there any studies which actually show what this study purports to show?
EDIT: Robin posted the citation. The original paper does not contain the crucial information. Details in my response to Robin.
EDIT: The original paper DOES contain the crucial info for the first experiment. I missed it the first time. It says:
.. a woman was escorted to the speed-dating room and left to have a 5-min private conversation with the man. Next, the experimenter escorted the woman to another room where she reported how much she had enjoyed the speed date by marking a 100-mm continuous “enjoyment scale” whose end points were marked not at all and very much. This report is hereinafter referred to as her affective report.
Next, a second woman was given one of two kinds of information: simulation information (which consisted of the man’s personal profile and photograph) or surrogation information (which consisted of the affective report provided by the first woman). The second woman was then asked to predict (on the enjoyment scale) how much she would enjoy her speed date with the man. This prediction is hereinafter referred to as her affective forecast.
After making her prediction, the second woman was shown the kind of information (simulation or surrogation) that she had not already received. We did this to ensure that each woman had the same information about theman before the actual speed date. The only difference between the two conditions, then, was whether the second woman had surrogation information or simulation information when she made her forecast.
Next, the second woman was escorted to the dating room, had a speed date, and then reported how much she enjoyed it (on the enjoyment scale). This report is hereinafter referred to as her affective report. The second woman also reported whether she believed that simulation information or surrogation information would have allowed her to make the more accurate prediction about the speed date she had and about a speed date that she might have in the future.
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)