The following is an argument for a weak form of the many-worlds hypothesis. The weak form I mean is that there are many observers in different branches of the wave function. The other branches "actually exist" for anthropic purposes; some observers are observing them. I've written before about difficulties with deriving discrete branches and observers from the Schrödinger equation; I'm ignoring this difficulty for now, instead assuming the existence of a many-worlds theory that specifies discrete branches and observers somehow.
To be clear, I'm not confident in the conclusion; it rests on some assumptions. In general, physics theories throughout history have not been completely correct. It would not surprise me if a superintelligence would consider many-worlds to be a false theory. Rather, I am drawing implications from currently largely accepted physics and computational complexity theory, and plausible anthropic assumptions.
First assumption: P != BQP. That is, there are some decision problems that cannot be decided in polynomial time by a classical computer but can be decided in polynomial time by an idealized quantum computer. This is generally accepted (RSA security depends on it) but not proven. This leaves open the possibility that the classically hardest BQP problems are only slightly harder than polynomial time. Currently, it is known that factorizing a b-bit integer can be done in roughly time where c is a constant greater than 1, while it can be done in polynomial time on an idealized quantum computer. I want to make an assumption that there are decision problems in BQP whose running time is "fast-growing", and I would consider "fast-growing" in this context despite not being truly exponential time. For example, a billion-bit number would require at least time to factorize with known classical methods, which is a sufficiently huge number for the purposes of this post.
Second assumption: The universe supports BQP computation in polynomial physical resources and clock time. That is, it's actually possible to build a quantum computer and solve BQP problems in polynomial clock time with polynomial physical resources (space, matter, energy, etc). This is implied by currently accepted quantum theories (up to a reasonably high limit of how big a quantum computer can be).
Third assumption: A "computational density anthropic prior", combining SIA with a speed prior, is a good prior over observations for anthropic purposes. As background, SIA stands for "self-indicating assumption" and SSA stands for "self-sampling assumption"; I'll assume familiarity with these theories, specified by Bostrom. According to SIA, all else being equal, universes that have more observers are more likely. Both SSA and SIA accept that universes with no observers are never observed, but only SIA accepts that universes with more observers are in general more likely. Note that SSA and SIA tend to converge in large universes (that is, in a big universe or multiverse with many observers, you're more likely to observe parts of the universe/multiverse with more observers, because of sampling). The speed prior implies that, all else being equal, universes that are more efficient to simulate (on some reference machine) are more likely. A rough argument for this is that in a big universe, many computations are run, and cheap computations are run more often, generating more observers. The computational density anthropic prior combines SIA with a speed prior, and says that we are proportionally more likely to observe universes that have a high ratio of observer-moments to required computation time. We could imagine aliens simulating many universes in parallel, re-running computationally inexpensive universes repeatedly when they "halt", and selecting out observer-moments uniformly at random; they're more likely to select universes that produce many observers relative to required computation time. I realize this assumption is contentious, but spelling out the argument might make it clear whether weaker assumptions would suffice.
The speed prior (and therefore the computational density prior) leaves open the question of what the reference machine is. While any polynomial time random-access-machine computation can be done on a Turing machine in polynomial time, the polynomials may be different, and this matters to the speed prior. For now I'll ignore differences between different polynomials, because the argument is about polynomial vs. non-polynomial time.
There is also the question of whether the reference machine is classical or quantum. A priori, classical computation is simpler, and more likely on Occamian grounds. Classical computation seems likely to be attended to by intelligent observers across a wide range of universes with different laws of physics, while we pay attention to quantum computation mainly for empirical reasons that depend on the laws of physics of our universe. It seems to me that baking quantum computation into the speed prior is awkward, because a prior is supposed to not depend on empirical observations.
This leads to assumption 4: If the probability of our observations are roughly similar (within a factor of 1,000,000,000, let's say) between a classical computational density prior and a quantum computational density prior, we should prefer the classical computational density prior. If we receive overwhelming evidence that the quantum computational density prior produces better predictions, we should eventually prefer it (at the meta-Bayesian level, imagining a prior over reference machines), but quantum computation is more complex to specify than classical computation, so in the absence of overwhelming evidence, the classical computational density prior is preferred.
That's enough assumptions for now. We can now consider a 2x2 table of hypotheses, each of which predict quantum computation; we only consider these due to assumption 2. Either the reference machine is classical or quantum. And either only one branch of the wave function contains anthropic observers (roughly Copenhagen), or many do in proportion to the number of branches (roughly many worlds). (I realize Copenhagen and many worlds have more details than this, I'm ignoring those other details for simplicity). Let's consider the probability of seeing roughly what we see under these different hypotheses. As a simplifying assumption, let's assume 10^15 observers in each branch of the wave function (so, 10^15 observers total in Copenhagen, and that multiplied by the number of branches in many worlds).
First, classical reference machine and Copenhagen. We get 10^15 observers, and the time to compute them is super-polynomial in the number of observers, by assumption 1. Computational density implies this is unlikely, because there are few observers per computation step.
Second, classical reference machine and many worlds. The number of observers is 10^15 times the number of branches. The time to compute this is also polynomial in the number of observers times the number of branches. The computational density of observers is reasonably high, so computational density implies this is reasonably likely.
Third, quantum reference machine and Copenhagen. We get 10^15 observers, and the time to compute them is polynomial in the number of observers. The computational density of observers is reasonably high, so computational density implies this is reasonably likely.
Fourth, quantum reference machine and many worlds. The number of observers is 10^15 times the number of branches. Since we're computing all the branches anyway, the quantum reference machine doesn't make a difference. So the logic is the same as with a classical reference machine and many worlds. The computational density of observers is reasonably high, so computational density implies this is reasonably likely.
So far, we have a decisive argument against classical reference machine and Copenhagen. Simulating a quantum universe classically and only picking out observers from one branch is just a really inefficient way to derive observer-moments from computation. Now we leverage assumption 4: if a quantum reference machine isn't very helpful for simulating the sort of observer-moments we see (that is, ones with experiences implying quantum computation is possible), we should prefer a classical reference machine. This implies that a classical reference machine and many worlds is the preferred hypothesis.
Intuitively, quantum reference machine and many worlds is quite implausible: the quantum reference machine is not helpful for simulating the many branches, so there is no reason to prefer a quantum reference machine in the speed prior, as it is more complex to specify. Quantum reference machine and Copenhagen is intuitively more plausible, since the quantum machine is at least being used. If there were strong reasons to think the universe contained few observers, that could be a reason to prefer quantum reference machine and Copenhagen over classical reference machine and many worlds. But in the absence of such reasons, classical reference machine and many worlds is preferred.
This concludes the main argument. To summarize, the only way to use a classical reference machine to efficiently simulate observer-moments in a universe supporting quantum computation is to simulate many branches and derive observers from many of them. A quantum reference machine doesn't have to directly simulate all the branches, but is less plausible on Occamian grounds; it's awkward to bake empirical physics theories into the speed prior.
I assume assumption 3 is the most contentious, so it might be worth re-visiting. One of the main arguments for SSA over SIA is the presumptuous philosopher argument: that is, SIA overwhelmingly prefers large universes a priori. The computational density prior (combining a speed prior with SIA) does not have this problem, because larger universes require more computation to simulate. Combining a speed prior with SSA seems to overly penalize large universes: they require more computation, but do not become more likely on account of containing more observers.
I am intuitively drawn to computational density in part because it is scale-invariant. It doesn't particularly care if there are one or many parallel copies of the same universe; replicating the universe generates proportionally more observers and costs proportionally more computation. I am not particularly motivated to try to make SSA speed priors work for this reason. However, I would be interested in the views of those who think SSA can be reasonably combined with a speed prior.
Content warning: info-hazard
Interesting question here is what happens with the total measure across multiverse in time:
(1) If it remains =1 then each branch has smaller and smaller measure and it compensates the increase in the number of observers. In that case, the argument for MWI is not working, but as we assume here MWI=true for branch splitting it, doesn't affect the proof. I am also should be very early (but I am not very early – there were anthropically-aware people before me).
(2) If total measure increases, the total number of observers is growing many orders of magnitude each second, and thus - based on SIA - I am now in the last moment of the existence of the universe before Big Rip (which is compensated by quantum immortality, so not a real problem).
The idea (2) is more probable than (1) as I know that I am not early, but can't know that I am in the last moment of the universe existence. (Ups, seems to be infohazardous idea)