Hacking the CEV for Fun and Profit

Wei Dai

It’s the year 2045, and Dr. Evil and the Singularity Institute have been in a long and grueling race to be the first to achieve machine intelligence, thereby controlling the course of the Singularity and the fate of the universe. Unfortunately for Dr. Evil, SIAI is ahead in the game. Its Friendly AI is undergoing final testing, and Coherent Extrapolated Volition is scheduled to begin in a week. Dr. Evil learns of this news, but there’s not much he can do, or so it seems. He has succeeded in developing brain scanning and emulation technology, but the emulation speed is still way too slow to be competitive.

There is no way to catch up with SIAI's superior technology in time, but Dr. Evil suddenly realizes that maybe he doesn’t have to. CEV is supposed to give equal weighting to all of humanity, and surely uploads count as human. If he had enough storage space, he could simply upload himself, and then make a trillion copies of the upload. The rest of humanity would end up with less than 1% weight in CEV. Not perfect, but he could live with that. Unfortunately he only has enough storage for a few hundred uploads. What to do…

Ah ha, compression! A trillion identical copies of an object would compress down to be only a little bit larger than one copy. But would CEV count compressed identical copies to be separate individuals? Maybe, maybe not. To be sure, Dr. Evil gives each copy a unique experience before adding it to the giant compressed archive. Since they still share almost all of the same information, a trillion copies, after compression, just manages to fit inside the available space.

Now Dr. Evil sits back and relaxes. Come next week, the Singularity Institute and rest of humanity are in for a rather rude surprise!

That's my point. If they do care about that, then the AI will do it. If it doesn't, then its not working right.

So care about other people how? And to what extent? That's the point of things like CEV.

It's only really a contradiction to us. Either the AI has a goal to make sure that there is always a democracy or it has a goal to simply build a democracy in which case it can abolish itself if it decides to do so.

Insufficient imagination. What if for example we tell the AI to try the first one and then it decides that the solution is to kill the people who don't support a democracy? That's the point, even when you've got something resembling a rough goal, you are assuming your AI will accomplish the goals the way a human would.

To get some idea of how easily something can go wrong it might help to say read about the stamp collecting device for starters. There's a lot that can go wrong with an AI. Even dumb optimizers often arrive at answers that are highly unexpected. Smart optimizers have the same problems but more so.

Bad AI's can, sure. If its bad though, whats it matter who its trying to follow orders from. It will ultimately try to turn them into paper clips as well.

What matters is that an unfriendly AI will make things bad for everyone. If someone screws up just once and makes a very smart paperclipper then that's an existential threat to humanity.

Your right. Sorry. There are a lot of variables to consider. It is one likely sceneario to consider. Currently, the internet isn't interfaced with the actual world enough that you could control everything from it, and I can't see any possible way any entity could take over. Doesn't mean it can't happen, but its also wrong to assume it will.

Well, no one is assuming that it will. But some people assign the scenario a high probability, and it only needs a very tiny probability to really be a bad scenario. Note incidentally that there's a lot a very smart entity could do simply with basic internet access. For example, consider what happens if the AI finds a fast way to factor numbers. Well then, lots of secure communication channels over the internet are now vulnerable. And that's aside from the more plausible but less dramatic problem of an AI finding flaws in programs that we haven't yet noticed. Even if our AI just decided to take over most of the world's computers to increase processing power that's a pretty unpleasant scenario for the rest of us. And that's on the lower end of problems. Consider how often some bad hacking incident occurs where some system that should not have been online is accessible online. Now think about how many automated or nearly fully automated plants there are (for cars, for chemicals for 3-rd printing). And that situation will only get worse over the next few years.

Worse, a smart AI can likely get people to release it from its box and allow it a lot more free reign. See the AI box test. Even if the AI has trouble dealing with that, an AI with internet access (which you seem to think wouldn't be that harmful) might not have trouble finding someone sympathetic to the AI if it portrayed itself sympathetically. These are all only some of the most obvious of failure modes. It may well be that some of the sneakiest things such an AI could do won't even occur to us because they are so beyond anything humans would think of. It helps for this sort of thing to not only have a minimally restricted imagination but also to realize that even such an imagination is likely too small to encompass all the possible things that can go wrong.

That's my point. If they do care about that, then the AI will do it. If it doesn't, then its not >>working right.

So care about other people how? And to what extent? That's the point of things like CEV.

If I understand Houshalter correctly, then his idea can be presented using the following story:

Suppose you worked out the theory of building self-improving AGIs with stable goal systems. The only problem left now is to devise an actual goal system that will represent what is best for humanity. So you spend the next several years engaged in deep mo... (read more)

3Blueberry16y

Thanks for that link. That is brilliant, especially Eliezer's comment: [...]

83

Hacking the CEV for Fun and Profit

83

83

83

Hacking the CEV for Fun and Profit

83

83