endoself comments on The Friendly AI Game - Less Wrong

38 Post author: bentarm 15 March 2011 04:45PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (170)

You are viewing a single comment's thread. Show more comments above.

Comment author: endoself 16 March 2011 07:41:29AM 2 points [-]

So it quickly stumbles into a safeguard that it has no knowledge of, then shuts down? Isn't that like ensuring friendliness by not plugging your AI in?

Comment author: DavidAgain 16 March 2011 05:23:53PM 0 points [-]

Not quite. I'm assuming you also try to make it so it wouldn't act like that in the first place, so if it WANTS to do that, you've gone wrong. That's the underlying issue: to identify dangerous tendencies and stop them growing at all, rather than trying to redirect them.

Comment author: endoself 16 March 2011 06:41:40PM *  0 points [-]

An AI noticing any patterns in its own behaviour is not a rare case that indicates that something has already gone wrong, but, if we allow this, it will accidentally discover its own safeguards fairly quickly: they are anything that causes its behaviour to not maximize what it believes to be its utility function.

Comment author: DavidAgain 16 March 2011 10:52:07PM 0 points [-]

It can't discover it's safeguards, as it's eliminated if it breaks ones. These are serious, final safeguards!

You could argue that a surviving one would notice that it hadn;t happened to do various things, and would form a sort of anthropic principle that the chance of it not having to have killed a human or whatever the safeguards are are very low, to note that humans have got this safeguard system and to work out from there what they are. But I think it would be easier to work the safeguards out more directly.

Comment author: endoself 16 March 2011 11:30:40PM 0 points [-]

I had misremembered something; I thought that there was a safeguard to ensure that it never tries to learn about its safeguards, rather than a prior making this unlikely.

Perfect safeguards are possible; in an extreme case, we could have a FAI monitoring every aspect of our first AI's behaviour. Can you give me a specific example of a safeguard so I can find a hole in it? :)