Why not just write failsafe rules into the superintelligent machine?

lukeprog

Many people think you can solve the Friendly AI problem just by writing certain failsafe rules into the superintelligent machine's programming, like Asimov's Three Laws of Robotics. I thought the rebuttal to this was in "Basic AI Drives" or one of Yudkowsky's major articles, but after skimming them, I haven't found it. Where are the arguments concerning this suggestion?

So, I agree with this statement, but it still floors me when I think about it.

I sometimes suspect that the phrase "recursively self-improving intelligence" is self-defeating here, in terms of communicating with such people, as it raises all kinds of distracting and ultimately irrelevant issues of self-reference. The core issue has nothing to do with self-improvement or with recursion or even with intelligence (interpreted broadly), it has to do with what it means to be a sufficiently capable optimizing agent. (Yes, I do understand that optimizing agent is roughly what we mean by "intelligence" here. I suspect that this is a large inferential step for many, though.)

I mean, surely they would agree that a sufficiently capable optimizing agent is capable of writing and executing a program much like itself but without the failsafe.

Of course, you can have a failsafe against writing such a program... but a superior optimizing agent can instead (for example) assemble a distributed network of processor nodes that happens to interact in such a way as to emulate such a program, to the same effect.

And you can have a failsafe against that, too, but now you're in a Red Queen's Race. And if what you want to build is an optimizing agent that's better at solving problems than you are, then either you will fail, or you will build an agent that can bypass your failsafes. Pick one.

This just isn't that complicated. Capable problem-solving systems solve problems, even ones you would rather they didn't. Anyone who has ever trained a smart dog, raised a child, or tried to keep raccoons out of their trash realizes this pretty quickly.

And if what you want to build is an optimizing agent that's better at solving problems than you are...

Just some miscellaneous thoughts:

I always flinch when I read something along those lines. It sounds like you could come up with something that by definition you shouldn't be able to come up with. I know that many humans can do better than one human alone but if it comes to the question of proving goal stability of superior agents then any agent will either have to face the same bottleneck or it isn't an important problem at all. By definition we are una... (read more)

13

Why not just write failsafe rules into the superintelligent machine?

13

13

13

Why not just write failsafe rules into the superintelligent machine?

13

13