we should see our odds of alignment being close to the knife's edge, because those are the situations that require the most computation-heavy simulations to determine the outcome of
No, because "successfully aligned" is a value-laden category. We could be worth simulating if our success probability is close to zero, but there's a lot of uncertainty over which unaligned-with-us superintelligence we create.
what a strange situation, that we have a chance at all: instead of alignment or superintelligence being discovered many decades apart, we're arriving at them in a somewhat synchronous manner!
It's a lot less strange if you consider that it's probably not actually that close. We're most likely to fail at one or both topics. And even if they happen, they're so clearly correlated that it would be strange NOT to see them together.
Still, I like the exploration of scenarios and the recognition that alignment (or understanding) with the entities outside the simulation is worth thinking about, if perhaps not as useful as thinking about alignment with future agents inside the simulation/reality.
(thanks to Alexander for conversations that led to this post)
what a strange time to live in, right on the verge of building an AI which will dictate the fate of the cosmos for all of the future!
what a strange situation, that we have a chance at all: instead of alignment or superintelligence being discovered many decades apart, we're arriving at them in a somewhat synchronous manner!
what a strange perspective, for me to be one of maybe a few hundred people whose work is directly related to this cosmos-defining event!
one way to explain making those strange observations is if this kind of anthropic reasoning occurs very disproportionately under these circumstances.
nevertheless, it is tempting to also consider something like the simulation hypothesis, which says that we are living inside an intentional simulation ran by some agent in a parent universe. i will list below a few such simulation hypotheses, that i can come up with or that i've come across.
it's a game
premise: one hypothesis i've heard a bunch is that this time period and place is being simulated as part of a game for post-singularity people to experience living in the most important century in history, perhaps even by making themselves part of those events except without memories. so basically, most instances of these surroundings are a tourist attraction.
what this would say about the parent universe: if this hypothesis is true, that's good evidence that the post-singularity future is at least somewhat aligned to us, because it contains agents that find our world interesting. the fact that this world seems to be running in its entirety, even including the suffering moral patients, is not a good sign however. either those agents have found a way to make this okay — perhaps through making these seemingly suffering moral patients not count, for example using something like moral patient deduplication — or the future has somewhat strong S-risk potential.
what to expect observing if this is true? we should expect our alignment chances to be neither overwhelmingly good nor bad, because those wouldn't be very interesting. maybe we should expect them to err on bad, though, as challenges can be enjoyable. the chance of various pivotal events, such as plagues or nuclear war, should be higher in this scenario because that seems interesting too; though if whoever's playing is embodied in a regular aging human body, our fate might be locked — or even our simulation terminated — not long after their avatar in this world dies.
what should we do if this is true? keep saving the world in case our simulation keeps running after our singularity, even just a bit. if we don't think this simulation keeps running after our singularity, and we suspect we inhabit a potentially-S-risky parent universe, then we should maybe favor effective altruism endeavors which alleviate suffering in the shorter term.
superintelligence predicting superintelligences
premise: in order to predict what kind of other superintelligences exist out there, a superintelligence is simulating civilizations close to the point at which they spawn superintelligence to see what they'd tend to make, or to find the decryption key or initial state of a homomorphically encrypted superintelligence that it has encountered. this could also explains why we seem to have a chance, rather than our odds being overwhelmingly one way or the other: the more uncertain a scenario is, the more detail the superintelligence might need to run it, and so we experience the most uncertain scenarios possible. note that there might be nested simulations, where one superintelligence simulates another coming into existence. finally, this possibility includes "deism", where one intelligence is/has dominion over its entire layer of reality from the start.
what this would say about the parent universe: this hypothesis being true does not say much; this kind of behavior seems instrumentally convergent to both aligned and unaligned superintelligence. i guess if we get to experience living as an instrumental side-effect that's kind of nice, but the S-risk concerns from the scenario above apply as well.
what to expect observing if this is true? we should see our odds of alignment being close to the knife's edge, because those are the situations that require the most computation-heavy simulations to determine the outcome of. ultimately, as our simulation is being ran for accuracy, we should expect to actually be the ones that determine what we build, and we should expect that outcome to matter — though probly not in any observable way.
what should we do if this is true? i think creating aligned superintelligence still has precedence; it feels like the more any given superintelligence expects that the universe is filled with superintelligences that carry our values, the more we increase the chances that our values apply to the universe at large. there may be weird reasons why this backfires, such as blackmail (acausal or not) between superintelligences; but in general, we'd expect superintelligences to have or invent themselves a decision theory which would pre-commit to not succomb to blackmail — though see also the game theory of blackmail.
indirect alignment solution
premise: it is possible that we have designed a superintelligence that is not directly aligned, but contains a process which we hope gets it there, similar to the situation described in the insulated goal-program. simulating this world may be part of this process, somehow.
what this would say about the parent universe: this would actually be a pretty good sign for alignment! we'd have succeeded in booting this process, and now we just have to hope that it makes good use of its ability to simulate us, and that we (inside the simulation) do a good job to enable alignment to eventually happen.
what to expect observing if this is true? a relatively realistic scenario, except maybe with some random anomalies such as someone's computer going "hello, you're actually inside a simulation meant to help with alignment, here are some things you can do to help" at some point.
what should we do if this is true? for those of us not contacted by an anomaly, keep saving the world as best we can, possibly with an emphasis on buying time rather than solving alignment. for those contacted by an anomaly, do whatever it says.
acausal probabilistic self-justification
premise: this weird idea, which i've seen kind of hinted at in some fictioned and more explicitely mentioned by Alexander in conversations with him, goes something like this: through weird acausal effects (such as those in can you control the past?) an AI might be able to increase the probability that we build it by affecting the distribution of what we do while building superintelligence, by running many modified simulations of us building superintelligence. in effect, an AI is making its coming-into-existence retroactively more likely by committing to simulate a bunch of other superintelligence-causing scenarios. this is a bit less crazy if the cosmos is something like a graph of universes, rather than a hierarchy.
what this would say about the parent universe: this hypothesis being true doesn't particularly indicate that we succeed or fail at doing alignment, though if the reasoning above is flawed, then it being instantiated is a hint that we at least got to affect something about the decision theory of the superintelligence we built, by making it erroneously do this. if the reasoning works, then this behavior is likely instrumentally convergent and it's not clear that AI needs us to have programmed it with a decision theory that leads it to running those simulations.
what to expect observing if this is true? our actions might be interfered with from the outside, albeit in a "plausible" — and thus, i'd imagine, unobservable? — way, that tends to lead to the AI that parent universe's AI wants. because this is meant to relate to the original parent-universe instances of us building superinelligence, we should expect our situation to be relatively "realistic": for at least its initial conditions to reflect how things have actually come about in the parent universe.
what should we do if this is true? if the weird acausal reasoning above is correct, then we should definitely work to solve alignment in order to help increase the probability of the aligned superintelligence, and reduce the probability of unaligned superintelligence. also, it may be that for this to work, the simulation needs to keep running at least a bunch after we build superintelligence, which is a good reason to solve alignment.