Wanted: backup plans for "seed AI turns out to be easy"

Wei Dai

Earlier, I argued that instead of working on FAI, a better strategy is to pursue an upload or IA based Singularity. In response to this, some argue that we still need to work on FAI/CEV, because what if it turns out that seed AI is much easier than brain emulation or intelligence amplification, and we can't stop or sufficiently delay others from building them? If we had a solution to CEV, we could rush to build a seed AI ourselves, or convince others to make use of the ideas.

But CEV seems a terrible backup plan for this contingency, since it involves lots of hard philosophical and implementation problems and therefore is likely to arrive too late if seed AI turns out to be easy. (Searching for whether Eliezer or someone else addressed the issue of implementation problems before, I found just a couple of sentences, in the original CEV document: "The task of construing a satisfactory initial dynamic is not so impossible as it seems. The satisfactory initial dynamic can be coded and tinkered with over years, and may improve itself in obvious and straightforward ways before taking on the task of rewriting itself entirely." Which does not make any sense to me—why can't every other AGI builder make the same argument, that their code can be "tinkered with" over many years, and therefore is safe? Why aren't we risking the "initial dynamic" FOOMing while it's being tinkered with? Actually, it seems to me that an AI cannot begin to extrapolate anyone's volition until it's already more powerful than a human, so I have no idea how the tinkering is supposed to work at all.)

So, granting that "seed AI is much easier than brain emulation or intelligence amplification" is a very real possibility, I think we need better backup plans. This post is a bit similar to The Friendly AI Game, in that I'm asking for a utility function for a seed AI, but the goal here is not necessarily to build an FAI directly, but to somehow make an eventual positive Singularity more likely, while keeping the utility function simple enough that there's a good chance it can be specified and implemented correctly within a relatively short amount of time. Also, the top entry in that post is an AI that can answer formally specified questions with minimal side effects, apparently with the idea that we can use such an AI to advance many kinds of science and technology. But I agree with Nesov—such an AI doesn't help, if the goal is an eventual positive Singularity:

We can do lots of useful things, sure (this is not a point where we disagree), but they don't add up towards "saving the world". These are just short-term benefits. Technological progress makes it easier to screw stuff up irrecoverably, advanced tech is the enemy. One shouldn't generally advance the tech if distant end-of-the-world is considered important as compared to immediate benefits [...]

To give an idea of the kind of "backup plan" I have in mind, one idea I've been playing with is to have the seed AI make multiple simulations of the entire Earth (i.e., with different "random seeds"), for several years or decades into the future, and have a team of humans pick the best outcome to be released into the real world. (I say "best outcome" but many of the outcomes will probably be incomprehensible or dangerous to directly observe, so they should mostly judge the processes that lead to the outcomes instead of the outcomes themselves.) This is still quite complex if you think about how to turn this "wish" into a utility function, and lots of things could still go wrong, but to me it seems at least the kind of problem that a team of human researchers/programmers can potentially solve within the relevant time frame.

Do others have any ideas in this vein?

We can do lots of useful things, sure (this is not a point where we disagree), but they don't add up towards "saving the world". These are just short-term benefits. Technological progress makes it easier to screw stuff up irrecoverably, advanced tech is the enemy. One shouldn't generally advance the tech if distant end-of-the-world is considered important as compared to immediate benefits [...]

Do others have any ideas in this vein?

I don’t think we’re discussing about quite the same thing.

I was talking about an AI that attempts to simulate the entire Earth, including itself, in faster than real time (see the quote). Note that this means the AI simulating the behavior of the rest of the world in response to the behavior of the simulated AI, which is somewhat messy even if you ignore the fact that in a faithful simulation the simulated AI would simulate the behavior of the whole world including themselves etc....

When I wrote the original comment I was in fact partly confusing emulating with simulating, as far as I can tell from what I wrote (can’t quite recall and I wouldn’t have trusted the memory if I did). Now, of course an AI can simulate the entire world including itself in faster than real-time. It doesn’t need to be an AI: humans do it all the time.

I’m pretty sure that, in the general case, and barring some exotic physics, no system can emulate itself (nor something containing itself) in faster than real-time.

Also, I’m pretty sure that if we discussed carefully about what we mean by “emulation” and “simulation” we’d generally agree.

My confusion stemmed from the fact that generally on LessWrong, in the context of really powerful AIs, AI simulations can be trusted. (Either it’s smart enough to only pick simplifications that really don’t affect the result, or it’s so powerful that it can in fact emulate the process, meaning it doesn’t simplify it at all and it can still do it faster than real time. Or it’s Omega and it’s just right by definition. Or maybe it can find fixed-points in functions as complex as the future history of the Earth.) I wasn’t careful enough about language.

But in the context of a seed AI, i.e. something much smarter/faster than us but not “godly” yet, and one we don’t trust to pick the best outcome of its possible actions upon the world, I can’t think of any reason we’d trust it to simulate such outcomes well enough for humans to pick among them, as the post I was answering to suggested.

(I mean, it could work for very limited purposes. It might be reasonable to try to change weather based on simulations not much better than what we can do now, for periods of one or a few weeks, but that’s a context where a small mistake would not destroy life on Earth. But look at climate change research and try to extrapolate to people deciding on matters of theogenesis based on simulations from seed AI...)

37

Wanted: backup plans for "seed AI turns out to be easy"

37

37

37

Wanted: backup plans for "seed AI turns out to be easy"

37

37