# Addresses in the Multiverse

*Abstract: If we assume that any universe can be modeled as a computer program which has been running for finitely many steps, then we can assign a multiverse-address to every event by combining its world-program with the number of steps into the world-program where it occurs. We define a probability distribution over multiverse-addresses called a Finite Occamian Multiverse (FOM). FOMs assign negligible probability mass to being a Boltzmann brain or to being in a universes that implements the Many Worlds Interpretation of quantum mechanics.*

One explanation of existence is the Tegmark level 4 multiverse, the idea that all coherent mathematical structures exist, and our universe is one of them. To make this meaningful, we must add a probability distribution over mathematical structures, effectively assigning each a degree of existence. Assume that the universe we live in can be fully modeled as a computer program, and that that program, and the number of steps it's been running for, are both finite. (Note that it's not clear whether our universe is finite or infinite; our universe is either spatially infinite, or expanding outwards at a rate greater than or equal to the speed of light, but there's no observation we could make inside the universe that would distinguish these two possibilities.) Call the program that implements our universe a world-program, W. This could be implemented in any programming language - it doesn't really matter which, since we can translate between languages by prepending some stuff to translate.

Now, suppose we choose a *particular event* in the universe - an atom emitting a photon, say - and we want to find a corresponding operation in the world-program. We could, in principle, run W until it starts working on the part of spacetime we care about, and count the steps. Call the number of steps leading up to this event T. Taken together, the pair (W,T) uniquely identifies a place, not just in the universe, but in the space of all possible universes. Call any such pair (W,T) a multiverse-address.

Now, suppose we observe an event. What should be our prior probability distribution over multiverse-addresses for that event? That is, for a given event (W,T), what is P(W=X and T=Y)?

For this question, our best (and pretty much only possible) tool is Occam's Razor. We're after a *prior* probability distribution, so we aren't going to bother including all the things we know about W from observation, except that we have good reason to believe that W is short - what we know of physics seems to indicate that at the most basic level, the rules are simple. So, first factor out W and apply Occam's Razor to it:

P(W=X and T=Y) = P(W=X) * P(T=Y|W=X)

P(W=X and T=Y) = exp(-len(W)) * P(T=Y|W=X)

Now assume independence between T and W. This isn't entirely correct (some world-programs are structured in such a way that all the events we might be looking for happen on even time-steps, for example), but that kind of entanglement isn't important for our purposes. Then apply Occam's Razor to T, getting

P(W=X and T=Y) = exp(-len(W)-len(T))

Now, applying Occam's razor to T requires some explanation, and there is one detail we have glossed over; we referred to the *length* of W and T (that is, their logarithm), when we should have referred to their *Kolmogorov complexity* - that is, their length after compression. For example, a world-program that contains 10^10 random instructions is much less likely than one that contains 10^10 copies of the same instruction. Suppose we resolve this by requiring W to be fully compressed, and give it an initialization stage where it unpacks itself before we start counting steps for T.

This lets us transfer bits of complexity from T to W, by having W run itself for awhile during the initialization stage. We can also transfer complexity from W to T, by writing W in such a way that it runs a class of programs in order, and T determines which of them it's running. Since we can transfer complexity back and forth between W and T, we can't justify applying Occam's Razor to one but not the other, so it makes sense to apply it to T. This also means that we should also treat T as compressible; it is more likely that the universe is 3^^^3 steps old than that is 207798236098322674 steps old.

To recap - we started by assuming that the universe is a computer program, W. We chose an event in W, corresponding to a computation that occurs after W has preformed T operations. We assume that W and T are both finite. Occam's Razor tells us that W, if fully compressed, should be short. We can trade off complexity between W and T, so we should also apply Occam's Razor to T and expect that T, if fully compressed, should also be short. We had to assume that the universe behaves like a computer program, that that program is finite, and that the probability distribution which Occam's Razor gives us is actually meaningful here.

We then got P(W=X and T=Y) = exp(-len(W)-len(T)). Call this probability distribution a Finite Occamian Multiverse (FOM). We can define this in terms of different programming languages, reweighting the probabilities of different universe-addresses somewhat, but all FOMs share some interesting properties.

A Finite Occamian Multiverse avoids the Boltzmann brain problem. A Boltzmann brain is a brain that, rather than living in a simulation with stable physics that allow it to continue to exist as the simulation advances, arises by chance out of randomly arranged particles or other simulation-components, and merely thinks (contains a representation of the claim that) it lives in a universe with stable physics. If you live in a FOM, then the probability that you are a Boltzmann brain is negligible because Boltzmann brains must have extremely complex multiverse-addresses, while evolved brains can have multiverse-addresses that are simple.

If we are in a Finite Occamian Multiverse, then the Many Worlds interpretation of quantum mechanics must be false, because if it were true, then any multiverse address would have to contain the complete branching history of the universe, so its length would be proportional to the mass of the universe times the age of the universe. On the other hand, if branches were selected according to a pseudo-random process, then multiverse-addresses would be short. This sort of pseudo-random process would slightly increase the length of W, but drastically decrease the length of T. In other words, in this type of multiverse, worldeaters eat more complexity than they contain.

If we are in a Finite Occamian Multiverse, then we might also expect certain quantities, such as the age and volume of the universe, to have much less entropy than otherwise expected. If, for example, we discovered that the universe had been running for exactly 3^^^3+725 time steps, then we could be reasonably certain that we were inside such a multiverse.

This kind of multiverse also sets an upper bound on the total amount of entropy (number of fully independent random bits) that can be gathered in one place, equal to the total complexity of that place's multiverse-address, since it would be possible to generate all of those bits from the multiverse-address by simulating the universe. However, since simulating the universe is intractible, the universe can still act as a very-strong cryptographic pseudorandom number generator.

## Comments (20)

BestI am not convinced by your argument that we are very unlikely to be Boltzmann brains or in an Everett universe. For instance, in an Everett universe, indeed multiverse addresses are very long, which means that your probability of being at a given multiverse address is very small; but that's counterbalanced by the fact that there are many instances of you in identical-looking situations at different addresses. (And what we observe is not the

eventas you've defined it but onlywhat we are able to observe of the event.)*3 points [-]Probability doesn't work where indexical uncertainty is at play. You are not allowed to assign probabilities to situations and act as if you are working with one of the possibilities, when it's possible that you are actually orchestrating the outcome from multiple locations at once. Two worlds, for example, could be simulating

each other, which makes any independence assumptions go away.*3 points [-]Have you heard of UDASSA? Seems very similar.

I object to using word 'exist' here. I can't see it refering to any sensible concept when you're talking about such a meta-level to our universe.

I posted some thoughts about this earlier. http://lesswrong.com/lw/1dt/open_thread_november_2009/1kj6

As one rationalist's

modus ponensis another'smodus tollens, the quantum nature of our world makes me doubt the Finite Occamian account of the mathematical universe; a Turing machine isn't really the native platform for the evolution of the wavefunction. In fact, the FOM hypothesis should imply that our underlying physics will turn out to be a rather simple finite state automaton or something of the sort, and that doesn't seem likely to me.Your reasoning about compressing T sounds like some sleight of hand. I think the trick happens when you unpack the program. That really shouldn't allow you to transfer complexity between the two.

Any pair (U, V) still has to refer to specific point, and you shouldn't have any special expectations that things happening on step one, 3^^3 or on any other point are somehow more likely than others. Unpacking shouldn't change that, and so, you shouldn't be able to transfer complexity of the program to the step count.

Can you “formalize” a bit more your reasoning about compressing T and W? I’m generally confused about compressibility: I understand the basics, but I don’t trust myself to judge a hand-wavy discussion about it.

In the particular case of your FMO, the glossing-over of these particulars prevents me from reasoning about the details, like the consequences of simple things like the pigeonhole principle to those of more complex things like the general non-computability of Kolmogorov complexity. (In the latter case, I’m not sure how to justify measuring universes by a non-computable property of theirs, and at the same time considering universes as computing machines.)

I also don’t get your argument for why T is compressible. It feels

ridiculouslyunlikely that the age of the universe (actually, the age of an event) be 3^^^3 (± some humanly useful number); I’d expect rather something like “anything between 3^^^3 and 2×3^^^3”, and most numbers in that range should be very incompressible. (BTW, I’m also confused about up-arrows, but I think 3^^^3 is vastly too large a number to count every particle interaction in the history of the visible universe.)(I don’t mean you should alter the article, just to comment with more details.)

*-1 points [-]I'll repeat myself one more time.

Instead of making the so called quantum seppuku, I just have to wait for 100 years. I should be in the "me alive" branch anyway.

If you buy MWI, you should buy this conclusion also.

I don't.

Taboo "I"

Are you invoking the absurdity heuristic? That method often fails outside the Middle World - for example, by failing to predict general relativity.

As a matter of fact, I do. And I don't see where it fails, I see the difficulty you have to answer my question.

Do you honestly expect that this natural Russian quantum roulette ends with you as a very, very old man, twice as old as the next oldest guy? For the sake of the discussion, put the Singularity aside, of course!

*2 points [-]Invoking the absurdity heuristic is a really bad idea. The power of absurdity to distinguish truth from falsity is far, far weaker than the power of - to give an amusing example - statistics.

Edit: The Wired article of the second link is "Nov. 4, 1952: Univac Gets Election Right, But CBS Balks" by Randy Alfred 11.04.08 - Wayback Machine link.

http://www.iep.utm.edu/reductio/

A legitimate way of reasoning.

*4 points [-]You have to derive an actual contradiction, not just something that seems absurd to you.

If you believe in MWI, but refuse to believe that you will observe a world of everybody much younger than you, since that is your branch you ended in - you believe in a contradiction.

You may avert it by abandon the quantum suicide thing, but not the MWI. I don't know if it's possible. So far nobody claimed such an option.

But some people believe in quantum immortality. There is no contradiction. There isn't even empirical evidence against it. The options, unless I'm missing one, are to find evidence against MWI, evidence against Quantum Immortality for a modus tollens, or change your mind. Quantum immortality sounding crazy isn't evidence against it.

Fine. So you believe that in the case if no Singularity happens, and no major medical advance, you'll be 200 and pretty much alone at this age?

It is a logical consequence of WMI. Don't you agree it is?

*1 point [-]Probably not but it isn't a negligible probability at this point. I assign MWI around .25 probability with the remainder going to pilot wave and "shit no one has thought of". But these probabilities are mostly based on instincts from the history of science and things physicists have said, I'm not a physicist and don't consider myself an authority. This is an easily altered set of predictions.

Maybe. We still need to interpret the interpretation. There are definitely some readings of MWI that deal with the Bohm probability rule by concluding that certain low probability worlds get mangled and never actualize- this could mean we eventually die after all because the probability of our survival becomes too small.

No, the many worlds interpretation is not sufficient to justify the conclusion. You also have to add the metaphysical assumption that your consciousness always jumps to a branch in which you survive. For my view on that, see http://lesswrong.com/lw/208/the_iless_eye/

*-1 points [-]"P(W=X and T=Y) = P(W=X) * P(T=Y|W=X); P(W=X and T=Y) = exp(-len(W)) * P(T=Y|W=X)" therefore P(W=X) = exp(-len(W)). I'm trying to find a way to get this to sum to 1 across all W, but failing. Is there something wrong with this prior probability, or am I doing my math wrong?

"For example, a world-program that contains 10^10 random instructions is much less likely than one that contains 10^10 copies of the same instruction." Is that really necessary if a world-program with 1 copy of an instruction is functionally indistinguishable from a world-program with 10^10 copies of that single instruction?

"Since we can transfer complexity back and forth between W and T, we can't justify applying Occam's Razor to one but not the other, so it makes sense to apply it to T. This also means that we should also treat T as compressible; it is more likely that the universe is 3^^^3 steps old than that is 207798236098322674 steps old." I don't think Occam's Razor works that way.