Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Oscar_Cunningham comments on Why not just write failsafe rules into the superintelligent machine? - Less Wrong

8 Post author: lukeprog 08 March 2011 09:07PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (79)

You are viewing a single comment's thread.

Comment author: Oscar_Cunningham 08 March 2011 09:29:10PM *  18 points [-]

The space of possible AI behaviours is large, you can't succeed by ruling parts of it out. It would be like a cake recipe that went

  1. Don't use avacados.
  2. Don't use a toaster.
  3. Don't use vegetables. ...

Clearly the list can never be long enough. Chefs have instead settled on the technique of actually specifying what to do. (Of course the analogy doesn't stretch very far, AI is less like trying to bake a cake, and more like trying to build a chef.)

Comment author: jimrandomh 08 March 2011 10:00:07PM 5 points [-]

To extend the analogy a bit further, a valid, fully specified cake recipe may be made safer by appending "if cake catches fire, keep oven door closed and turn off heat." The point of safeguards would not be to tell the AI what to do, but rather to mitigate the damage if the instructions were incorrect in some unforeseen way.

Comment author: DavidAgain 08 March 2011 10:08:26PM 2 points [-]

Presumably with a very powerful AI, if it was going wrong we'd have seriously massive problems. So the best failsafe would be retrospective and rely on a Chinese Wall within the AI so it was banned from working round via another failsafe.

So when the AI hits certain prompts (realising that humans must all be destroyed) this sets off the hidden ('subconscious') failsafe that switches it off. Or possibly makes it sing Daisy, Daisy while slowly sinking into idiocy.

To clarify, I know nothing about AI or these 'genie' debates, so sorry if this has already been discussed and doesn't work at all. At the London meetup I tried out the idea of an AI which only cared about a small geographical area to limit risk: someone pointed out that it would happily eat the rest of the universe to help its patch. Oh well.

Comment author: XiXiDu 09 March 2011 10:37:10AM 1 point [-]

At the London meetup I tried out the idea of an AI which only cared about a small geographical area to limit risk: someone pointed out that it would happily eat the rest of the universe to help its patch. Oh well.

You've to understand that the basic argument is the mere possibility that AI might be dangerous and the high-risk associated with it. Even if it would be unlikely to happen, the vast amount of negative utility associated with it does outweigh its low probability.

Comment author: DavidAgain 10 March 2011 08:45:32AM 2 points [-]

I got that! The problem was more that I was thinking as if the world could be divided up into sealable boxes. In practice, we can do a lot focusing on one area with 'no effect' on anything else. But this is because the sorts of actions we do are limited, we can't detect the low-level impact on things outside those boxes and we have certain unspoken understandings about what sort of thing might constitute an unacceptable effect elsewhere (if I only care about looking at pictures of LOLcats, I might be 'neutral to the rest of the internet' except for taking a little bandwidth. A superpowerful AI might realise that it would slightly increase the upload speed and thus maximise the utility function if vast swathes of other internet users were dead).

Comment author: lessdazed 02 July 2011 02:19:22PM 0 points [-]

(if I only care about looking at pictures of LOLcats, I might be 'neutral to the rest of the internet' except for taking a little bandwidth. A superpowerful AI might realise that it would slightly increase the upload speed and thus maximise the utility function if vast swathes of other internet users were dead).

Dead and hilariously captioned, possibly.

I bet such an AI could latch onto a meme in which the absence of cats in such pictures was "lulz".