Why not just write failsafe rules into the superintelligent machine?

lukeprog

Many people think you can solve the Friendly AI problem just by writing certain failsafe rules into the superintelligent machine's programming, like Asimov's Three Laws of Robotics. I thought the rebuttal to this was in "Basic AI Drives" or one of Yudkowsky's major articles, but after skimming them, I haven't found it. Where are the arguments concerning this suggestion?

What is the difference between "a rule" and "what it wants"?

I'm interpreting this as the same question you wrote below as "What is the difference between a constraint and what is optimized?". Dave gave one example but a slightly different metaphor comes to my mind.

Imagine an amoral businessman in a country that takes half his earnings as tax. The businessman wants to maximize money, but has the constraint is that half his earnings get taken as tax. So in order to achieve his goal of maximizing money, the businessman sets up some legally permissible deal with a foreign tax shelter or funnels it to holding corporations or something to avoid taxes. Doing this is the natural result of his money-maximization goal, and satisfies the "pay taxes" constraint..

Contrast this to a second, more patriotic businessman who loved paying taxes because it helped his country, and so didn't bother setting up tax shelters at all.

The first businessman has the motive "maximize money" and the constraint "pay taxes"; the second businessman has the motive "maximize money and pay taxes".

From the viewpoint of the government, the first businessman is an unFriendly agent with a constraint, and the second businessman is a Friendly agent.

Does that help answer your question?

The first businessman has the motive "maximize money" and the constraint "pay taxes"; the second businessman has the motive "maximize money and pay taxes".

I read your comment again. I now see the distinction. One merely tries to satisfy something while the other tries to optimize it as well. So your definition of a 'failsafe' is a constraint that is satisfied while something else is optimized. I'm just not sure how helpful such a distinction is as the difference is merely how two different parameters are optimized. One opti... (read more)

1XiXiDu15y

Very well put. I understood that line of reasoning from the very beginning though and didn't disagree that complex goals need complex optimization parameters. But I was making a distinction between insufficient and unbounded optimization parameters, goal-stability and the ability or desire to override them. I am aware of the risk of telling an AI to compute as many digits of Pi as possible. What I wanted to say is that if time, space and energy are part of its optimization parameters then no matter how intelligent it is, it will not override them. If you tell the AI to compute as many digits of Pi as possible while only using a certain amount of time or energy for the purpose of optimizing and computing it then it will do so and hold. I'm not sure what is your definition of a 'failsafe' but making simple limits like time and space part of the optimization parameters sounds to me like one. What I mean by 'optimization parameters' are the design specifications of the subject of the optimization process, like what constitutes a paperclip. It has to use those design specifications to measure its efficiency and if time and space limits are part of it then it will take account of those parameters as well.

0Alexandros15y

What's stopping us from adding 'maintain constraints' to the agent's motive?

13

Why not just write failsafe rules into the superintelligent machine?

13

13

13

Why not just write failsafe rules into the superintelligent machine?

13

13