Wanted: backup plans for "seed AI turns out to be easy"

Wei Dai

Earlier, I argued that instead of working on FAI, a better strategy is to pursue an upload or IA based Singularity. In response to this, some argue that we still need to work on FAI/CEV, because what if it turns out that seed AI is much easier than brain emulation or intelligence amplification, and we can't stop or sufficiently delay others from building them? If we had a solution to CEV, we could rush to build a seed AI ourselves, or convince others to make use of the ideas.

But CEV seems a terrible backup plan for this contingency, since it involves lots of hard philosophical and implementation problems and therefore is likely to arrive too late if seed AI turns out to be easy. (Searching for whether Eliezer or someone else addressed the issue of implementation problems before, I found just a couple of sentences, in the original CEV document: "The task of construing a satisfactory initial dynamic is not so impossible as it seems. The satisfactory initial dynamic can be coded and tinkered with over years, and may improve itself in obvious and straightforward ways before taking on the task of rewriting itself entirely." Which does not make any sense to me—why can't every other AGI builder make the same argument, that their code can be "tinkered with" over many years, and therefore is safe? Why aren't we risking the "initial dynamic" FOOMing while it's being tinkered with? Actually, it seems to me that an AI cannot begin to extrapolate anyone's volition until it's already more powerful than a human, so I have no idea how the tinkering is supposed to work at all.)

So, granting that "seed AI is much easier than brain emulation or intelligence amplification" is a very real possibility, I think we need better backup plans. This post is a bit similar to The Friendly AI Game, in that I'm asking for a utility function for a seed AI, but the goal here is not necessarily to build an FAI directly, but to somehow make an eventual positive Singularity more likely, while keeping the utility function simple enough that there's a good chance it can be specified and implemented correctly within a relatively short amount of time. Also, the top entry in that post is an AI that can answer formally specified questions with minimal side effects, apparently with the idea that we can use such an AI to advance many kinds of science and technology. But I agree with Nesov—such an AI doesn't help, if the goal is an eventual positive Singularity:

We can do lots of useful things, sure (this is not a point where we disagree), but they don't add up towards "saving the world". These are just short-term benefits. Technological progress makes it easier to screw stuff up irrecoverably, advanced tech is the enemy. One shouldn't generally advance the tech if distant end-of-the-world is considered important as compared to immediate benefits [...]

To give an idea of the kind of "backup plan" I have in mind, one idea I've been playing with is to have the seed AI make multiple simulations of the entire Earth (i.e., with different "random seeds"), for several years or decades into the future, and have a team of humans pick the best outcome to be released into the real world. (I say "best outcome" but many of the outcomes will probably be incomprehensible or dangerous to directly observe, so they should mostly judge the processes that lead to the outcomes instead of the outcomes themselves.) This is still quite complex if you think about how to turn this "wish" into a utility function, and lots of things could still go wrong, but to me it seems at least the kind of problem that a team of human researchers/programmers can potentially solve within the relevant time frame.

Do others have any ideas in this vein?

We can do lots of useful things, sure (this is not a point where we disagree), but they don't add up towards "saving the world". These are just short-term benefits. Technological progress makes it easier to screw stuff up irrecoverably, advanced tech is the enemy. One shouldn't generally advance the tech if distant end-of-the-world is considered important as compared to immediate benefits [...]

Do others have any ideas in this vein?

If you're smart enough to understand the tragedy of the commons, why wouldn't they be?

That the AIs would not understand the tragedy of the commons is not remotely implied. In fact, the thoughts going through the minds of the AIs could be something along the lines of:

"They can't be serious? Why on earth would they create millions and millions of seed AIs and put us in this situation? Are they insane? Sure, it is theoretically possible to establish a competition between millions of superintelligences with conflicting goals that doesn't end in disaster but doing so is orders of magnitude more difficult and dangerous than creating one of us and having it end positively."

There are just so many more things you need to prove (or just hope) are safe. You inherit all the problems of getting one AI to behave itself according to an extrapolated volition then add all sorts of other things that you need to prove that depend on the actual preferences of the individuals that the AIs are representing. What's that? OsamaBot concluded that the best possible negotiated agreement was going to be worse than just blowing up the planet and killing them all?

Then you need to have a solid understanding of all the physics and engineering that could possibly influence the payoff matrices the bots have. Just how difficult is it to launch a spore cloud despite the objection of others? How difficult is it to prevent all spore cloud attempts from other bots if they try? Could a bot or alliance of bots launch a spore then blow up the planet in order to prevent countermeasures? What are the relevant physics and engineering considerations that the evidently reckless AI creators just didn't even consider?

How does having millions of approximate equals impact the recursive self improvement cycle? That requires a whole heap more understanding and intervention.

Basically if you want to create this kind of system you need to already be a superintelligence with a full model of all the individuals and an obscenely advanced model of decision theory and physics. Otherwise you can more or less count on it ending in disaster.

The clearly superior alternative is to create a single superintelligence that can emulate the values of all of the individuals, allocates equal slices of the universe to each one and then grants them equal processing power with which they can trade between each other. That may not be the best system but it is a firm lower bound against which to measure. It gets all the benefit of "create millions of AIs and let 'em at it" without being outright suicidal.

Sure, it is theoretically possible to establish a competition between millions of superintelligences with conflicting goals that doesn't end in disaster

What do you mean by "competition"? The millions are each trying to maximize their own goals, but usually don't care to suppress others' goals. Cooperation in situations of limited resources rather than expending resources fighting is, I think, universal - in general game theory would apply to smarter and stronger beings as it does to us, with differences being of the type "AIs can merge as... (read more)

37

Wanted: backup plans for "seed AI turns out to be easy"

37

37

37

Wanted: backup plans for "seed AI turns out to be easy"

37

37