The self-indication assumption (SIA) states that
Given the fact that you exist, you should (other things equal) favor hypotheses according to which many observers exist over hypotheses on which few observers exist.
The reason this is a bad assumption might not be obvious at first. In fact, I think it's very easy to miss.
Argument for SIA posted on Less Wrong
First, let's take a look at a argument for SIA that appeared at Less Wrong (link). Two situations are considered.
1. we imagine that there are 99 people in rooms that have a blue door on the outside (1 person per room). One person is in a room with a red door on the outside. It was argued that you are in a blue door room with probability 0.99.
2. Same situation as above, but first a coin is flipped. If heads, the red door person is never created. If tails, the blue door people are never created. You wake up in a room and know these facts. It was argued that you are in a blue door room with probability 0.99.
So why is 1. correct and 2. incorrect? The first thing we have to be careful about is not treating yourself as special. The fact that you woke up just tells you that at least one conscious observer exists.
In scenario 1 we basically just need to know what proportion of conscious observers are in a blue door room. The answer is 0.99.
In scenario 2 you never would have woken up in a room if you hadn't been created. Thus, the fact that you exist is something we have to take into account. We don't want to estimate P(randomly selected person, regardless of if they exist or not, is in a blue door room). That would be ignoring the fact that you exist. Instead, the fact that you exist tells us that at least one conscious observer exists. Again, we want to know what proportion of conscious observers are in blue door rooms. Well, there is a 50% chance (if heads landed) that all conscious observers are in blue door rooms, and a 50% chance that all conscious observers are in red door rooms. Thus, the marginal probability of a conscious observer being in a blue door room is 0.5.
The flaw in the more detailed Less Wrong proof (see the post) is when they go from step C to step D. The *you* being referred to in step A might not exist to be asked the question in step D. You have to take that into account.
General argument for SIA and why it's wrong
Let's consider the assumption more formally.
Assume that the number of people to be created, N, is a random draw from a discrete uniform distribution1 on {1,2,...,Nmax}. Thus, P(N=k)=1/Nmax, for k=1,...,Nmax. Assume Nmax is large enough so that we can effectively ignore finite sample issues (this is just for simplicity).
Assume M= Nmax*(Nmax+1)/2 possible people exist, and we arbitrarily label them 1,...,M. After the size of the world, say N=n, is determined, then we randomly draw n people from the M possible people.
After the data are collected we find out that person x exists.
We can apply Bayes' theorem to get the posterior probability:
P(N=k|x exists)=k/M, for k=1,...,Nmax.
The prior probability was uniform, but the posterior favors larger worlds. QED.
Well, not really.
The flaw here is that we conditioned on person x existing, but person x only became of interest after we saw that they existed (peeked at the data).
What we really know is that at least one conscious observer exists -- there is nothing special about person x.
So, the correct conditional probability is:
P(N=k|someone exists)=1/Nmax, for k=1,...,Nmax.
Thus, prior=posterior and SIA is wrong.
Egotism
The flaw with SIA that I highlighted here is it treats you as special, as if you were labeled ahead of time. But the reality is, no matter who was selected, they would think they are the special person. "But I exist, I'm not just some arbitrary person. That couldn't happen in small world. It's too unlikely." In reality, that fact that I exist just means someone exists. I only became special after I already existed (peeked at the data and used it to construct the conditional probability).
Here's another way to look at it. Imagine that a random number between 1 and 1 trillion was drawn. Suppose 34,441 was selected. If someone then asked what the probability of selecting that number was, the correct answer is 1 in 1 trillion. They could then argue, "that's too unlikely of an event. It couldn't have happened by chance." However, because they didn't identify the number(s) of interest ahead of time, all we really can conclude is that a number was drawn, and drawing a number was a probability 1 event.
I give more examples of this here.
I think Nick Bostrom is getting at the same thing in his book (page 125):
..your own existence is not in general a ground for thinking that hypotheses are more likely to be true just by virtue of implying that there is a greater total number of observers. The datum of your existence tends to disconfirm hypotheses on which it would be unlikely that any observers (in your reference class) should exist; but that’s as far as it goes. The reason for this is that the sample at hand—you—should not be thought of as randomly selected from the class of all possible observers but only from a class of observers who will actually have existed. It is, so to speak, not a coincidence that the sample you are considering is one that actually exists. Rather, that’s a logical consequence of the fact that only actual observers actually view themselves as samples from anything at all
Related arguments are made in this LessWrong post.
1 for simplicity I'm assuming a uniform prior... the prior isn't the issue here
Many forms of the anthropic argument just don't hold water. You can bend over backwards to find the fault in the logic, and I applaud your effort here to do that.
I think an easier way to dismiss the set of arguments is to think of two different cases, one in which there are few observers and one in which there are many, and then ask how the subjective observer could use the anthropic argument to distinguish these two. She can't.
Then these arguments can be discounted with the line of reasoning that says if a theory can't tell you which world you're in, then it predicts everything, so it tells you nothing. (Evidence for a given theory is the observation of an event that is more likely to occur if the theory is true than if it is false.)
Consider the argument that since we're observers at a relatively early time in human technological development, this means we should update that there is a higher probability that humans don't persist for a hugely long time after. This argument kind of makes sense when worded exactly as "if humans persisted for billions of years, what is the probability I would be a human in the first .005 billion years?".
But the way to test if that line of reasoning works is to ask, suppose you have two realities, one in which humans persisted for .1 billion years and one in which humans persisted for 100 billion years. How could the set of observers at .005 billion years use the anthropic argument to distinguish between the two? They couldn't. The anthropic argument has no power to select among these two realities; the anthropic principle predicts exactly the same set of observations for the set of observers at time point .005 billion years for the two different realities. Likewise, consider that there are 50 red rooms or 5000 red rooms, and one blue door. The person who wakes in the blue room has no evidence about the number of red rooms, because her observations (a blue room) are exactly the same for both cases.