Why not just write failsafe rules into the superintelligent machine?

lukeprog

Many people think you can solve the Friendly AI problem just by writing certain failsafe rules into the superintelligent machine's programming, like Asimov's Three Laws of Robotics. I thought the rebuttal to this was in "Basic AI Drives" or one of Yudkowsky's major articles, but after skimming them, I haven't found it. Where are the arguments concerning this suggestion?

Thanks for the links. I'll try to make time to check them out more closely.

I had previously skimmed a bunch of lesswrong content and didn't find anything that dissuaded me from the Asimov's Laws++ idea. I was encouraged by the first post in the Metaethics Sequence where Eliezer warns about not "trying to oversimplify human morality into One Great Moral Principle." The law/ethics corpus idea certainly doesn't do that!

RE: your first and final paragraphs: If I had to characterize my thoughts on how AIs will operate, I'd say they're likely to be eminently rational. Certainly not anthropomorphized as virtuous or vicious human beings. They will crank the numbers, follow the rules, run the simulations, do the math, play the odds as only machines can. Probably (hopefully?) they'll have little of the emotional/irrational baggage we humans have been selected to have. Given that, I don't see much motivation for AIs to fixate on gaming the system. They should be fine with following and improving the rules as rational calculus dictates, subject to the aforementioned checks and balances. They might make impeccable legislators, lawyers, and judges.

I wonder if this solution was dismissed too early by previous analysts due some kind of "scale bias?" The idea of having only 3 or 4 or 5 (Asimov) Laws for FAI is clearly flawed. But scale that to a few hundred thousand or a million, and it might work. No?

Given that, I don't see much motivation for AIs to fixate on gaming the system.

Motivation? It's not as if most AIs would have a sense that gaming a rule system is "fun", but rather it would be the most efficient way to achieve its goals. Human beings don't usually try to achieve one of their consciously stated goals with maximum efficiency, at any cost, to an unbounded extent. That's because we actually have a fairly complicated subconscious goal system which overrides us when we might do something too dumb in pursuit of our conscious goals. T... (read more)

13

Why not just write failsafe rules into the superintelligent machine?

13

13

13

Why not just write failsafe rules into the superintelligent machine?

13

13