Self-indication assumption is wrong for interesting reasons
The self-indication assumption (SIA) states that
Given the fact that you exist, you should (other things equal) favor hypotheses according to which many observers exist over hypotheses on which few observers exist.
The reason this is a bad assumption might not be obvious at first. In fact, I think it's very easy to miss.
Argument for SIA posted on Less Wrong
First, let's take a look at a argument for SIA that appeared at Less Wrong (link). Two situations are considered.
1. we imagine that there are 99 people in rooms that have a blue door on the outside (1 person per room). One person is in a room with a red door on the outside. It was argued that you are in a blue door room with probability 0.99.
2. Same situation as above, but first a coin is flipped. If heads, the red door person is never created. If tails, the blue door people are never created. You wake up in a room and know these facts. It was argued that you are in a blue door room with probability 0.99.
So why is 1. correct and 2. incorrect? The first thing we have to be careful about is not treating yourself as special. The fact that you woke up just tells you that at least one conscious observer exists.
In scenario 1 we basically just need to know what proportion of conscious observers are in a blue door room. The answer is 0.99.
In scenario 2 you never would have woken up in a room if you hadn't been created. Thus, the fact that you exist is something we have to take into account. We don't want to estimate P(randomly selected person, regardless of if they exist or not, is in a blue door room). That would be ignoring the fact that you exist. Instead, the fact that you exist tells us that at least one conscious observer exists. Again, we want to know what proportion of conscious observers are in blue door rooms. Well, there is a 50% chance (if heads landed) that all conscious observers are in blue door rooms, and a 50% chance that all conscious observers are in red door rooms. Thus, the marginal probability of a conscious observer being in a blue door room is 0.5.
The flaw in the more detailed Less Wrong proof (see the post) is when they go from step C to step D. The *you* being referred to in step A might not exist to be asked the question in step D. You have to take that into account.
General argument for SIA and why it's wrong
Let's consider the assumption more formally.
Assume that the number of people to be created, N, is a random draw from a discrete uniform distribution1 on {1,2,...,Nmax}. Thus, P(N=k)=1/Nmax, for k=1,...,Nmax. Assume Nmax is large enough so that we can effectively ignore finite sample issues (this is just for simplicity).
Assume M= Nmax*(Nmax+1)/2 possible people exist, and we arbitrarily label them 1,...,M. After the size of the world, say N=n, is determined, then we randomly draw n people from the M possible people.
After the data are collected we find out that person x exists.
We can apply Bayes' theorem to get the posterior probability:
P(N=k|x exists)=k/M, for k=1,...,Nmax.
The prior probability was uniform, but the posterior favors larger worlds. QED.
Well, not really.
The flaw here is that we conditioned on person x existing, but person x only became of interest after we saw that they existed (peeked at the data).
What we really know is that at least one conscious observer exists -- there is nothing special about person x.
So, the correct conditional probability is:
P(N=k|someone exists)=1/Nmax, for k=1,...,Nmax.
Thus, prior=posterior and SIA is wrong.
Egotism
The flaw with SIA that I highlighted here is it treats you as special, as if you were labeled ahead of time. But the reality is, no matter who was selected, they would think they are the special person. "But I exist, I'm not just some arbitrary person. That couldn't happen in small world. It's too unlikely." In reality, that fact that I exist just means someone exists. I only became special after I already existed (peeked at the data and used it to construct the conditional probability).
Here's another way to look at it. Imagine that a random number between 1 and 1 trillion was drawn. Suppose 34,441 was selected. If someone then asked what the probability of selecting that number was, the correct answer is 1 in 1 trillion. They could then argue, "that's too unlikely of an event. It couldn't have happened by chance." However, because they didn't identify the number(s) of interest ahead of time, all we really can conclude is that a number was drawn, and drawing a number was a probability 1 event.
I give more examples of this here.
I think Nick Bostrom is getting at the same thing in his book (page 125):
..your own existence is not in general a ground for thinking that hypotheses are more likely to be true just by virtue of implying that there is a greater total number of observers. The datum of your existence tends to disconfirm hypotheses on which it would be unlikely that any observers (in your reference class) should exist; but that’s as far as it goes. The reason for this is that the sample at hand—you—should not be thought of as randomly selected from the class of all possible observers but only from a class of observers who will actually have existed. It is, so to speak, not a coincidence that the sample you are considering is one that actually exists. Rather, that’s a logical consequence of the fact that only actual observers actually view themselves as samples from anything at all
Related arguments are made in this LessWrong post.
1 for simplicity I'm assuming a uniform prior... the prior isn't the issue here
Swimming in Reasons
To a rationalist, certain phrases smell bad. Rotten. A bit fishy. It's not that they're actively dangerous, or that they don't occur when all is well; but they're relatively prone to emerging from certain kinds of thought processes that have gone bad.
One such phrase is for many reasons. For example, many reasons all saying you should eat some food, or vote for some candidate.
To see why, let's first recapitulate how rational updating works. Beliefs (in the sense of probabilities for propositions) ought to bob around in the stream of evidence as a random walk without trend. When, in contrast, you can see a belief try to swim somewhere, right under your nose, that's fishy. (Rotten fish don't really swim, so here the analogy breaks down. Sorry.) As a Less Wrong reader, you're smarter than a fish. If the fish is going where it's going in order to flee some past error, you can jump ahead of it. If the fish is itself in error, you can refuse to follow. The mathematical formulation of these claims is clearer than the ichthyological formulation, and can be found under conservation of expected evidence.
More generally, according to the law of iterated expectations, it's not just your probabilities that should be free of trends, but your expectation of any variable. Conservation of expected evidence is just the special case where a variable can be 1 (if some proposition is true) or 0 (if it's false); the expectation of such a variable is just the probability that the proposition is true.
So let's look at the case where the variable you're estimating is an action's utility. We'll define a reason to take the action as any info that raises your expectation, and the strength of the reason as the amount by which it does so. The strength of the next reason, conditional on all previous reasons, should be distributed with expectation zero.
Maybe the distribution of reasons is symmetrical: for example, if somehow you know all reasons are equally strong in absolute value, reasons for and against must be equally common, or they'd cause a predictable trend. Under this assumption, the number of reasons in favor will follow a binomial distribution with p=.5. Mostly, the values here will not be too extreme, especially for large numbers of reasons. When there are ten reasons in favor, there are usually at least a few against.
But what if that doesn't happen? What if ten pieces of info in a row all favor the action you're considering?
Bayesian Collaborative Filtering
I present an algorithm I designed to predict which position a person would report for an issue on TakeOnIt, through Bayesian updates on the evidence of other people's positions on that issue. Additionally, I will point out some potential areas of improvement, in the hopes of inspiring others here to expand on this method.
For those not familiar with TakeOnIt, the basic idea is that there are issues, represented by yes/no questions, on which people can take the positions Agree (A), Mostly Agree (MA), Neutral (N), Mostly Disagree (MD), or Disagree (D). (There are two types of people tracked by TakeOnIt: users who register their own opinions, and Experts/Influencers whose opinions are derived from public quotations.)
The goal is to predict what issue a person S would take on a position, based on the positions registered by other people on that question. To do this, we will use Bayes' Theorem to update the probability that person S takes the position X on issue I, given that person T has taken position Y on issue I:
Really, we will be updating on several people Tj taking positions Ty on I:
SIA won't doom you
Katja Grace has just presented an ingenious model, claiming that SIA combined with the great filter generates its own variant of the doomsday argument. Robin echoed this on Overcoming Bias. We met soon after Katja had come up with the model, and I signed up to it, saying that I could see no flaw in the argument.
Unfortunately, I erred. The argument does not work in the form presented.
First of all, there is the issue of time dependence. We are not just a human level civilization drifting through the void in blissful ignorance about our position in the universe. We know (approximately) the age of our galaxy, and the time elapsed since the big bang.
How is this relevant? It is relevant because all arguments about the great filter are time-dependent. Imagine we had just reached consciousness and human-level civilization, by some fluke, two thousand years after the creation of our galaxy, by an evolutionary process that took two thousand years. We see no aliens around us. In this situation, we have no reason to suspect any great filter; if we asked ourselves "are we likely to be the first civilization to reach this stage?" then the answer is probably yes. No evidence for a filter.
Imagine, instead, that we had reached consciousness a trillion years into the life of our galaxy, again via an evolutionary process that took two thousand years, and we see no aliens or traces of aliens. Then the evidence for a filter is overwhelming; something must have stopped all those previous likely civilizations from emerging into the galactic plane.
So neither of these civilizations can be included in our reference class (indeed, the second one can only exist if we ourselves are filtered!). So the correct reference class to use is not "the class of all potential civilizations in our galaxy that have reached our level of technological advancement and seen no aliens", but "the class of all potential civilizations in our galaxy that have reached our level of technological advancement at around the same time as us and seen no aliens". Indeed, SIA, once we update on the present, cannot tell us anything about the future.
But there's more.
Musings on probability
I read this comment, and after a bit of rambling I realized I was as confused as the poster. A bit more thinking later I ended up with the “definition” of probability under the next heading. It’s not anything groundbreaking, just a distillation (specifically, mine) of things discussed here over the time. It’s just what my brain thinks when I hear the word.
What is Bayesianism?
This article is an attempt to summarize basic material, and thus probably won't have anything new for the hard core posting crowd. It'd be interesting to know whether you think there's anything essential I missed, though.
You've probably seen the word 'Bayesian' used a lot on this site, but may be a bit uncertain of what exactly we mean by that. You may have read the intuitive explanation, but that only seems to explain a certain math formula. There's a wiki entry about "Bayesian", but that doesn't help much. And the LW usage seems different from just the "Bayesian and frequentist statistics" thing, too. As far as I can tell, there's no article explicitly defining what's meant by Bayesianism. The core ideas are sprinkled across a large amount of posts, 'Bayesian' has its own tag, but there's not a single post that explicitly comes out to make the connections and say "this is Bayesianism". So let me try to offer my definition, which boils Bayesianism down to three core tenets.
We'll start with a brief example, illustrating Bayes' theorem. Suppose you are a doctor, and a patient comes to you, complaining about a headache. Further suppose that there are two reasons for why people get headaches: they might have a brain tumor, or they might have a cold. A brain tumor always causes a headache, but exceedingly few people have a brain tumor. In contrast, a headache is rarely a symptom for cold, but most people manage to catch a cold every single year. Given no other information, do you think it more likely that the headache is caused by a tumor, or by a cold?
If you thought a cold was more likely, well, that was the answer I was after. Even if a brain tumor caused a headache every time, and a cold caused a headache only one per cent of the time (say), having a cold is so much more common that it's going to cause a lot more headaches than brain tumors do. Bayes' theorem, basically, says that if cause A might be the reason for symptom X, then we have to take into account both the probability that A caused X (found, roughly, by multiplying the frequency of A with the chance that A causes X) and the probability that anything else caused X. (For a thorough mathematical treatment of Bayes' theorem, see Eliezer's Intuitive Explanation.)
Two probabilities
Consider the following statements:
1. The result of this coin flip is heads.
2. There is life on Mars.
3. The millionth digit of pi is odd.
What is the probability of each statement?
A frequentist might say, "P1 = 0.5. P2 is either epsilon or 1-epsilon, we don't know which. P3 is either 0 or 1, we don't know which."
A Bayesian might reply, "P1 = P2 = P3 = 0.5. By the way, there's no such thing as a probability of exactly 0 or 1."
Which is right? As with many such long-unresolved debates, the problem is that two different concepts are being labeled with the word 'probability'. Let's separate them and replace P with:
F = the fraction of possible worlds in which a statement is true. F can be exactly 0 or 1.
B = the Bayesian probability that a statement is true. B cannot be exactly 0 or 1.
Clearly there must be a relationship between the two concepts, or the confusion wouldn't have arisen in the first place, and there is: apart from both obeying various laws of probability, in the case where we know F but don't know which world we are in, B = F. That's what's going on in case 1. In the other cases, we know F != 0.5, but our ignorance of its actual value makes it reasonable to assign B = 0.5.
When does the difference matter?
Suppose I offer to bet my $200 the millionth digit of pi is odd, versus your $100 that it's even. With B3 = 0.5, that looks like a good bet from your viewpoint. But you also know F3 = either 0 or 1. You can also infer that I wouldn't have offered that bet unless I knew F3 = 1, from which inference you are likely to update your B3 to more than 2/3, and decline.
On a larger scale, suppose we search Mars thoroughly enough to be confident there is no life there. Now we know F2 = epsilon. Our Bayesian estimate of the probability of life on Europa will also decline toward 0.
Once we understand F and B are different functions, there is no contradiction.
The Prediction Hierarchy
Related: Advancing Certainty, Reversed Stupidity Is Not Intelligence
The substance of this post is derived from a conversation in the comment thread which I have decided to promote. Teal;deer: if you have to rely on a calculation you may have gotten wrong for your prediction, your expectation for the case when your calculation is wrong should use a simpler calculation, such as reference class forecasting.
Edit 2010-01-19: Toby Ord mentions in the comments Probing the Improbable: Methodological Challenges for Risks with Low Probabilities and High Stakes (PDF) by Toby Ord, Rafaela Hillerbrand, and Anders Sandberg of the Future of Humanity Institute, University of Oxford. It uses a similar mathematical argument, but is much more substantive than this.
A lottery has a jackpot of a million dollars. A ticket costs one dollar. Odds of a given ticket winning are approximately one in forty million. If your utility is linear in dollars, should you bet?
The obvious (and correct) answer is "no". The clever (and incorrect) answer is "yes", as follows:
According to your calculations, "this ticket will not win the lottery" is true with probability 99.9999975%. But can you really be sure that you can calculate anything to that good odds? Surely you couldn't expect to make forty million predictions of which you were that confident and only be wrong once. Rationally, you ought to ascribe a lower confidence to the statement: 99.99%, for example. But this means a 0.01% chance of winning the lottery, corresponding to an expected value of a hundred dollars. Therefore, you should buy the ticket.
The logic is not obviously wrong, but where is the error?
What Are Probabilities, Anyway?
In Probability Space & Aumann Agreement, I wrote that probabilities can be thought of as weights that we assign to possible world-histories. But what are these weights supposed to mean? Here I’ll give a few interpretations that I've considered and held at one point or another, and their problems. (Note that in the previous post, I implicitly used the first interpretation in the following list, since that seems to be the mainstream view.)
- Only one possible world is real, and probabilities represent beliefs about which one is real.
- Which world gets to be real seems arbitrary.
- Most possible worlds are lifeless, so we’d have to be really lucky to be alive.
- We have no information about the process that determines which world gets to be real, so how can we decide what the probability mass function p should be?
- All possible worlds are real, and probabilities represent beliefs about which one I’m in.
- Before I’ve observed anything, there seems to be no reason to believe that I’m more likely to be in one world than another, but we can’t let all their weights be equal.
- Not all possible worlds are equally real, and probabilities represent “how real” each world is. (This is also sometimes called the “measure” or “reality fluid” view.)
- Which worlds get to be “more real” seems arbitrary.
- Before we observe anything, we don't have any information about the process that determines the amount of “reality fluid” in each world, so how can we decide what the probability mass function p should be?
- All possible worlds are real, and probabilities represent how much I care about each world. (To make sense of this, recall that these probabilities are ultimately multiplied with utilities to form expected utilities in standard decision theories.)
- Which worlds I care more or less about seems arbitrary. But perhaps this is less of a problem because I’m “allowed” to have arbitrary values.
- Or, from another perspective, this drops another another hard problem on top of the pile of problems called “values”, where it may never be solved.
Probability Space & Aumann Agreement
The first part of this post describes a way of interpreting the basic mathematics of Bayesianism. Eliezer already presented one such view at http://lesswrong.com/lw/hk/priors_as_mathematical_objects/, but I want to present another one that has been useful to me, and also show how this view is related to the standard formalism of probability theory and Bayesian updating, namely the probability space.
The second part of this post will build upon the first, and try to explain the math behind Aumann's agreement theorem. Hal Finney had suggested this earlier, and I'm taking on the task now because I recently went through the exercise of learning it, and could use a check of my understanding. The last part will give some of my current thoughts on Aumann agreement.
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)