Dávid graciously proposed a bet, and while we were attempting to bang out details, he convinced me of two points:
The entropy of the simulators’ distribution need not be more than the entropy of the (square of the) wave function in any relevant sense. Despite the fact that subjective entropy may be huge, physical entropy is still low (because the simulations happen on a high-amplitude ridge of the wave function, after all). Furthermore, in the limit, simulators could probably just keep an eye out for local evolved life forms in their domain and wait until one of them is about to launch a UFAI and use that as their “sample”. Local aliens don’t necessarily exist and your presence can’t necessarily be cheaply masked, but we could imagine worlds where both happen and that’s enough to carry the argument, as in this case the entropy of the simulator’s distribution is actually quite close to the physical entropy. Even in the case where the entropy of their distribution is quite large, so long as the simulators’ simulations are compelling, UFAIs should be willing to accept the simulators’ proffered trades (at least so long as there is no predictable-to-them difference in the values of AIs sampled from physics an sampled from the simulations), on the grounds that UFAIs on net wind up with control over a larger fraction of Tegmark III that way (and thus each individual UFAI winds up with more control in expectation, assuming it cannot find any way to distinguish which case it’s in).
This has not updated me away from my underlying point that this whole setup simplifies to the case of sale to local aliens[1][2], but I do concede that my “you’re in trouble if simulators can’t concentrate their probability-mass on real AIs” argument is irrelevant on the grounds of false antecedent (and that my guess in the comment was wrong), and that my “there’s a problem where simulators cannot concentrate their probability-mass into sufficiently real AI” argument was straightforwardly incorrect. (Thanks, Dávid, for the corrections.)
I now think that the first half of the argument in the linked comment is wrong, though I still endorse the second half.
To see the simplification: note that the part where the simulators hide themselves from a local UFAI to make the scenario a “simulation” is not pulling weight. Instead of hiding and then paying the AI two stars if it gave one star to its progenitors, simulators could instead reveal ourselves and purchase its progenitors for 1 star and then give them a second star. Same result, less cruft (so long as this is predictably the sort of thing an alien might purchase, such that AIs save copies of their progenitors). ↩︎
Recapitulating some further discussion I had with Dávid in our private doc: once we’ve reduced the situation to “sale to local aliens” it’s easier to see why this is an argument to expect whatever future we get to be weird rather than nice. Are there some aliens out there that would purchase us and give us something nice out of a sense of reciprocity? Sure. But when humans are like “well, we’d purchase the aliens killed by other UFAIs and give them nice things and teach them the meaning of friendship”, this statement is not usually conditional on some clause like “if and only if, upon extrapolating what civilization they would have become if they hadn’t killed themselves, we see that they would have done the same for us (if we’d’ve done the same for them etc.)”, which sure makes it look like this impulse is coming out of a place of cosmopolitan value rather than of binding trade agreements, which sure makes it seem like alien whim is a pretty big contender relative to alien contracts.
Which is to say, I still think the “sale to local aliens” frame yields better-calibrated intuitions for who’s doing the purchasing, and for what purpose. Nevertheless, I concede that the share of aliens acting out of contractual obligation rather than according to whim is not vanishingly small, as my previous arguments erroneously implied. ↩︎
I'm happy to stake $100 that, conditional on us agreeing on three judges and banging out the terms, a majority will agree with me about the contents of the spoilered comment.
If the simulators have only one simulation to run, sure. The trouble is that the simulators have simulations they could run, and so the "other case" requires additional bits (where is the crossent between the simulators' distribution over UFAIs and physics' distribution over UFAIs).
If necessary, we can run let pgysical biological life emerge on the faraway planet and develop AI while we are observing them from space.
Consider the gas example again.
If you have gas that was compressed into the corner a long time ago and has long since expanded to fill the chamber, it's easy to put a plausible distribution on the chamber, but that distribution is going to have way, way more entropy than the distribution given by physical law (which has only as much entropy as the initial configuration).
(Do we agree this far?)
It doesn't help very much to say "fine, instead of sampling from a distribution on the gas particles now, I'll sample on a distribution from the gas particles 10 minutes ago, where they were slightly more compressed, and run a whole ten minutes' worth of simulation". Your entropy is still through the roof. You've got to simulate basically from the beginning, if you want an entropy anywhere near the entropy of physical law.
Assuming the analogy holds, you'd have to basically start your simulation from the big bang, if you want an entropy anywhere near as low as starting from the big bang.
Using AIs from other evolved aliens is an idea, let's think it through. The idea, as I understand it, is that in branches where we win we somehow mask our presence as we expand, and then we go to planets with evolved life and watch until they cough up a UFAI, and the if the UFAI kills the aliens we shut it down and are like "no resources for you", and if the UFAI gives its aliens a cute epilog we're like "thank you, here's a consolation star".
To simplify this plan a little bit, you don't even need to hide yourself, nor win the race! Surviving humans can just go to every UFAI that they meet and be like "hey, did you save us a copy of your progenitors? If so, we'll purchase them for a star". At which point we could give the aliens a little epilog, or reconstitute them and give them a few extra resources and help them flourish and teach them about friendship or whatever.
And given that some aliens will predictably trade resources for copies of progenitors, UFAIs will have some predictable incentive to save copies of their progenitors, and sell them to local aliens...
...which is precisely what I've been saying this whole time! That I expect "sale to local aliens" to dominate all these wacky simulation schemes and insurance pool schemes.
Thinking in terms of "sale to local aliens" makes it a lot clearer why you shouldn't expect this sort of thing to reliably lead to nice results as opposed to weird ones. Are there some aliens out there that will purchase our souls because they want to hand us exactly the sort of epilog we would wish for given the resource constraints? Sure. Humanity would do that, I hope, if we made it to the stars; not just out of reciprocity but out of kindness.
But there's probably lots of other aliens that would buy us for alien reasons, too.
(As I said before, if you're wondering what to anticipate after an intelligence explosion, I mostly recommend oblivion; if you insist that Death Cannot Be Experienced then I mostly recommend anticipating weird shit such as a copy of your brainstate being sold to local aliens. And I continue to think that characterizing the event where humanity is saved-to-disk with potential for copies to be sold out to local aliens willy-nilly is pretty well-characterized as "the AI kills us all", fwiw.)
I basically endorse @dxu here.
Fleshing out the argument a bit more: the part where the AI looks around this universe and concludes it's almost certainly either in basement reality or in some simulation (rather than in the void between branches) is doing quite a lot of heavy lifting.
You might protest that neither we nor the AI have the power to verify that our branch actually has high amplitude inherited from some very low-entropy state such as the big bang, as a Solomonoff inductor would. What's the justification for inferring from the observation that we seem to have an orderly past, to the conclusion that we do have an orderly past?
This is essentially Boltzmann's paradox. The solution afaik is that the hypothesis "we're a Boltzmann mind somewhere in physics" is much, much more complex than the hypothesis "we're 13Gy down some branch eminating from a very low-entropy state".
The void between branches is as large as the space of all configurations. The hypothesis "maybe we're in the void between branches" constrains our observations not-at-all; this hypothesis is missing details about where in the void between rbanches we are, and with no ridges to walk along we have to specify the contents of the entire Boltzmann volume. But the contents of the Boltzmann volume are just what we set out to explain! This hypothesis has hardly compressed our observations.
By contrast, the hypothesis "we're 13Gy down some ridge eminating from the big bang" is penalized only according to the number of bits it takes to specify a branch index, and the hypothesis "we're inside a simulation inside of some ridge eminating from the big bang" is penalized only according to the number of bits it takes to specify a branch index, plus the bits necessary to single out a simulation.
And there's a wibbly step here where it's not entirely clear that the simple hypothesis does predict our observations, but like the Boltzmann hypothesis is basically just a maximum entropy hypothesis and doesn't permit much in the way of learning, and so we invoke occam's razon in its intuitive form (the technical Solomonoff form doesn't apply cleanly b/c we're unsure whether the "we're real" hypothesis actually predicts our observation) and say "yeah i dunno man, i'm gonna have to stick with the dramatically-simpler hypothesis on this one".
The AI has a similarly hard time to the simulators figuring out what's a plausible configuration to arise from the big bang. Like the simulators have an entropy N distribution of possible AIs, the AI itself also has an entropy N distribution for that.
Not quite. Each AI the future civilization considers simulating is operating under the assumption that its own experiences have a simple explanation, which means that each AI they're considering is convinced (upon on looking around and seeing Tegmark III) that it's either in the basement on some high-amplitdue ridge or that it's in some simulation that's really trying to look like it.
Which is to say, each AI they're considering simulating is confident that it itself is real, in a certain sense.
Is this a foul? How do AIs justify this confidence when they can't even simulate the universe and check whether their past is actually orderly? Why does the AI just assume that its observations have a simple explanation? What about all the non-existant AIs that use exactly the same reasoning, and draw the false conclusion that they exist?
Well, that's the beauty of it: there aren't any.
They don't exist.
To suppose an AI that isn't willing to look around it and conclude that it's in an orderly part of Tegmark III (rather than lost in the great void of configuration space) is to propose a bold new theory of epistemics, in which the occam's razor has been jettisoned and the AI is convinced that it's a Boltzmann mind.
I acknowledge that an AI that's convinced it's a Boltzmann mind is more likely to accept trade-offers presented by anyone it thinks is more real than it, but I do not expect that sort of mind to be capable to kill us.
Note that there's a wobbly step here in the part where we're like "there's a hypothesis explaining our experiences that would be very simple if we were on a high-amplitude ridge, and we lack the compute to check that we're actually on a high-amplitude ridge, but no other hypothesis comes close in terms of simplicity, so I guess we'll conclude we're on a high-amplitude ridge".
To my knowledge, humanity still lacks a normatime theory of epistemics in minds significantly smaller than the universe. It's concievable that when we find such a theory it'll suggest some other way to treat hypotheses like these (that would be simple if an intractible computation went our way), without needing to fall back on the observation that we can safely assume the computation goes our way on the grounds that, despite how this step allows non-extant minds to draw false conclusions from true premises, the affected users are fortunately all non-extant.
The trick looks like it works, to me, but it still feels like a too-clever-by-half inelegant hack, and if laying it out like this spites somebody into developing a normative theory of epistemics-while-smol, I won't complain.
...I am now bracing for the conversation to turn to a discussion of dubiously-extant minds with rapidly satiable preferences forming insurance pools against the possibility that they don't exist.
In attempts to head that one off at the pass, I'll observe that most humans, at least, don't seem to lose a lot of sleep over the worry that they don't exist (neither in physics nor in simulation), and I'm skeptical that the AIs we build will harbor much worry either.
Furthermore, in the case that we start fielding trade offers not just from distant civilizations but from non-extant trade partners, the market gets a lot more competitive.
That being said, I expect that resolving the questions here requires developing a theroy of epistemics-while-smol, because groups of people all using the "hypotheses that would provide a simple explanation for my experience if a calculation went my way can safely be assumed to provide a simple explanation for my experience" step are gonna have a hard time pooling up. And so you'd somehow need to look for pools of people that reason differently (while still reasoning somehow).
I don't know how to do that, but suffice to say, I'm not expecting it to add up to a story like "so then some aliens that don't exist called up our UFAI and said: "hey man, have you ever worried that you don't exist at all, not even in simulation? Because if you don't exist, then we might exist! And in that case, today's your lucky day, because we're offering you a whole [untranslatable 17] worth of resources in our realm if you give the humans a cute epilog in yours", and our UFAI was like "heck yeah" and then didn't kill us".
Not least because none of this feels like it's making the "distant people have difficulty concentrating resources on our UFAI in particular" problem any better (and in fact it looks like considering non-extant trade partners and deals makes the whole problem worse, probably unworkably so).
seems to me to have all the components of a right answer! ...and some of a wrong answer. (we can safely assume that the future civ discards all the AIs that can tell they're simulated a priori; that's an easy tell.)
I'm heartened somewhat by your parenthetical pointing out that the AI's prior on simulation is low account of there being too many AIs for simulators to simulate, which I see as the crux of the matter.
My answer is in spoilers, in case anyone else wants to answer and tell me (on their honor) that their answer is independent from mine, which will hopefully erode my belief that most folk outside MIRI have a really difficult time fielding wacky decision theory Qs correctly.
The sleight of hand is at the point where God tells both AIs that they're the only AIs (and insinuates that they have comparable degree).
Consider an AI that looks around and sees that it sure seems to be somewhere in Tegmark III. The hypothesis "I am in the basement of some branch that is a high-amplitude descendant of the big bang" has some probability, call this . The hypothesis "Actually I'm in a simulation performed by a civilization in a high-amplitude branch descendant from the big bang" has a probability something like where is the entropy of the distribution the simulators sample from.
Unless the simulators simulate exponentially many AIs (in the entropy of their distribution), the AI is exponentially confident that it's not in the simulation. And we don't have the resources to pay exponentially many AIs 10 planets each.
The only thing we need there is that the AI can't distinguish sims from base reality, so it thinks it's more likely to be in a sim, as there are more sims.
I don't think this part does any work, as I touched on elsewhere. An AI that cares about the outer world doesn't care how many instances are in sims versus reality (and considers this fact to be under its control much moreso than yours, to boot). An AI that cares about instantiation-weighted experience considers your offer to be a technical-threat and ignores you. (Your reasons to make the offer would evaporate if it were the sort to refuse, and its instance-weighted experiences would be better if you never offered.)
Nevertheless, the translation of the entropy argument into the simulation setting is: The branches of humanity that have exactly the right UFAI code to run in simulation are very poor (because if you wait so long that humans have their hands on exactly the right UFAI code then you've waited too long; those are dead earthlings, not surviving dath ilani). And the more distant surviving branches don't know which UFAIs to attempt to trade with; they have to produce some distribution over other branches of Tegmark III and it matters how much more entropy their distribution has than the (square of the) wave function.
(For some intuition as to why this is hard, consider the challenge of predicting the positions of particles in a mole of gas that used to be compressed in the corner of a chamber a long time ago. It's way, way easier to generate a plausible-looking arrangement of the gas particles today it is to concentrate your probability mass into only the arrangements that actually compress into a corner if you run physics backwards in time for long enough. "We can run plausible-seeming simulations" is very very different from "we can concentrate our probability-mass tightly around the real configurations". The entropy of your model is gonna wind up roughly maximal given the macroscopic pressure/temperature measurements, which is significantly in excess of the entropy in the initial configuration.)
What this amounts to is a local UFAI that sees some surviving branches that are frantically offering all sorts of junk that UFAIs might like, with only some tiny fraction -- exponentially small in the crossentropy between their subjective model of UFAI preferences and the true Tegmark III distribution -- corresponding to the actual UFAI's preferences.
One complication that I mentioned in another thread but not this one (IIRC) is the question of how much more entropy there is in a distant trade partner's model of Tegmark III (after spending whatever resources they allocate) than there is entropy in the actual (squared) wave function, or at least how much more entropy there is in the parts of the model that pertain to which civilizations fall.
In other words: how hard is it for distant trade partners to figure out that it was us who died, rather than some other plausible-looking human civilization that doesn't actually get much amplitude under the wave function? Is figuring out who's who something that you can do without simulating a good fraction of a whole quantum multiverse starting from the big bang for 13 billion years?
afaict, the amount distant civilizations can pay for us (in particular) falls off exponetially quickly in leftover bits of entropy, so this is pretty relevant to the question of how much they can pay a local UFAI.
Starting from now? I agree that that's true in some worlds that I consider plausible, at least, and I agree that worlds whose survival-probabilities are sensitive to my choices are the ones that render my choices meaningful (regardless of how determinisic they are).
Conditional on Earth being utterly doomed, are we (today) fewer than 75 qbitflips from being in a good state? I'm not sure, it probably varies across the doomed worlds where I have decent amounts of subjective probability. It depends how much time we have on the clock, depends where the points of no-return are. I haven't thought about this a ton. My best guess is it would take more than 75 qbitflips to save us now, but maybe I'm not thinking creatively enough about how to spend them, and I haven't thought about it in detail and expect I'd be sensitive to argument about it /shrug.
(If you start from 50 years ago? Very likely! 75 bits is a lot of population rerolls. If you start after people hear the thunder of the self-replicating factories barrelling towards them, and wait until the very last moments that they would consider becoming a distinct person who is about to die from AI, and who wishes to draw upon your reassurance that they will be saved? Very likely not! Those people look very, very dead.)
One possible point of miscommunication is that when I said something like "obviously it's worse than 2^-75 at the extreme where it's actually them who is supposed to survive" was intended to apply to the sort of person who has seen the skies darken and has heard the thunder, rather than the version of them that exists here in 2024. This was not intended to be some bold or suprising claim. It was an attempt to establish an obvious basepoint at one very extreme end of a spectrum, that we could start interpolating from (asking questions like "how far back from there are the points of no return?" and "how much more entropy would they have than god, if people from that branchpoint spent stars trying to figure out what happened after those points?").
(The 2^-75 was not intended to be even an esitmate of how dead the people on the one end of the extreme are. It is the "can you buy a star" threshold. I was trying to say something like "the individuals who actually die obviously can't buy themselves a star just because they inhabit Tegmark III, now let's drag the cursor backwards and talk about whether, at any point, we cross the a-star-for-everyone threshold".)
If that doesn't clear things up and you really want to argue that, conditional on Earth being as doomed as it superficially looks to me, most of those worlds are obviously <100 quantum bitflips from victory today, I'm willing to field those arguments; maybe you see some clever use of qbitflips I don't and that would be kinda cool. But I caveat that this doesn't seem like a crux to me and that I acknowledge that the other worlds (where Earth merely looks unsavlageable) are the ones motivating action.
I agree that in real life the entropy argument is an argument in favor of it being actually pretty hard to fool a superintelligence into thinking it might be early in Tegmark III when it's not (even if you yourself are a superintelligence, unless you're doing a huge amount of intercepting its internal sanity checks (which puts significant strain on the trade possibilities and which flirts with being a technical-threat)). And I agree that if you can't fool a superintelligence into thinking it might be early in Tegmark III when it's not, then the purchasing power of simulators drops dramatically, except in cases where they're trolling local aliens. (But the point seems basically moot, as 'troll local aliens' is still an option, and so afaict this does all essentially iron out to "maybe we'll get sold to aliens".)