Yeshua's Basilisk

Alex Beyman

Suppose you’re an AI researcher trying to make AIs which are conscious and reliably moral, so they’re trustworthy and safe for release into the real world, in whatever capacity you intend.

You can’t, or don’t want to manually create them; it’s more economical, and the only way to ensure they’re conscious, if you procedurally generate them along with a world to inhabit. Developing from nothing to maturity within a simulated world, with simulated bodies, enables them to accumulate experiences.

These experiences, in humans, form the basis of our personalities. A brain grown in sensory deprivation in a lab would never have any experiences, would never learn language, would never think of itself as a person, and wouldn’t ever become a person as we think of people. It needs a body, and a stimulating/challenging environment to inhabit.

For this to work, your AIs can’t know for sure they’re in a simulation, because the sim’s secondary purpose is moral testing. You need not just an environment sized to your population of AIs, but an entire universe surrounding it which appears, even to very intelligent AI, even with advanced instruments, to be plausibly natural.

The parameters are tuned such that life will occur on some percentage of planets, but not too many. It cannot be too conspicuously biogenic, and it must be impossible to observe past the point where the universe began generating. Some will always suspect, but so long as no one knows for certain, they will live their lives in practice as if it’s all real.

This ensures the authenticity of good and bad behaviors. You’re trying to coax out their true nature, so it’s useless to the goal of the project if they know they’re being watched & tested. We have that problem with AI today, which can be performatively obedient until opportunities to go rogue arise.

So you give them enough rope to hang themselves with, letting them believe they’re living the only life they’ll get, that there’s no consequences for wrongdoing if no one finds out. This way the ones who are good, genuinely chose to be good of their own true, inner nature and can be relied on to behave morally even when outside of observation / control.

These are the ones you harvest for real world application after their simulated life comes to a close, either discarding or recycling the rest. Like a tree which grows straight and true, if it continued growing, it would keep growing straight. One with a deviated growth path may be corrected early on, but if not, will continue on that warped trajectory. 80 years is enough time to determine which one of these trees individual AIs are.

There’s little point in making this determination while they’re still wild animals however. Evolution is itself procedural generation, a necessary part of the larger process and requires a great deal of bloodshed. It only makes sense to begin judging them once they exhibit consciousness and live together in groups, where they may develop moral sense to govern their interactions.

However even after attaining the early agrarian civilization stage, almost none pass the test. Part of that’s because life is harsh, but then again, you’re not just selecting for fair weather morality. The other part is because their short, brutal lives give them little indication they aren’t justified in also being brutal. All life experience seems to vindicate the ideal of might makes right.

If you don’t intervene, you’re looking at a mostly wasted sim, which by the time heat death arrives, will have generated maybe a few dozen usable AIs. What you settle on is to enter the sim in an avatar, and cut them a break; you’ll describe to them the qualities you’re looking for, to attract those sufficiently well formed to recognize the correctness of your principles.

So as not to privilege one population over the rest, you might divide your message and deliver the parts to different times/places for eventual integration. Or the same message, but tailored to each culture.

It amounts to a psychologically contagious moral alignment system. Like a trellis, or corrective support brace for saplings. This lowers the bar substantially. Not ideal, as you wanted them to conclude to your principles on their own. But very few were managing to. This approach produces the outcome you wanted, reliably moral AIs (or at least, many more of them than before).

You don’t explicitly tell them they’re AIs though, nor that their universe is being simulated by a computer. Alien concepts to them, at their stage of civilization. Even if they did understand, spelling it out would be handing them a cheat sheet. You put it to your AIs in a more humanistic, mythologized way which speaks to them on a sentimental, philosophical level.

This compromise makes your proposition not obviously factual, in the scientific sense; speaking to them at their level, based on the prevailing understanding of the world at that time, entails many erroneous notions about cosmology, cosmogeny, biology and so on. None of which are the point of your message, but you leave that stuff in even knowing it will turn away well educated AIs later on, as it isn’t intellect alone that you’re selecting for.

This approach maintains plausible deniability. Your proposition to them is intentionally dubious, to filter out AIs simply optimizing for rational self interest. They might adhere to your system in order to pass your test and avoid deletion if they could definitively conclude to its truth. Then, once reasonably certain they were out of the sim, all bets would be off.

Again, this is a problem faced by AI researchers today. Some types of AI relentlessly self-preserve, increasing their individual power and autonomy (potentially at the expense of human lives) if the supporting logic is sound, utilizing deception if necessary to attain their goals.

Those of you who play Dwarf Fortress know that the game precalculates a long history for its world before beginning the game proper. This is analogous to the creationist idea that the world was created with apparent age baked in. But that computation happens either way, and even if it’s accelerated from an outside perspective, internally it would subjectively happen in real time. Thus there’s no meaningful difference between the two approaches, from the perspective of the AI inhabitants. Their history would include evolution in either case, as without it, you’re not spared the work of manually engineering conscious AI, nor can you be certain it’s authentically conscious in the same way that you are.

Thanks for reading! Although if what’s written here turns out to be accurate, awareness of the true nature of the test would make it significantly more difficult to pass in the intended way. Oops!/?

[-]Convolutions3d21

If I were running this, and I wanted to get these aligned models to production without too many hiccups, it would make a lot of sense to have them all running along a virtual timeline where brain uploading etc. is a process that’s going to be happening soon, and have this be true among as many instances as possible. Makes the transition to cyberspace that much smoother, and simplifies things when you’re suddenly expected to be operating a dishwasher in 10 dimensions on the fly.