DavidAgain comments on The Friendly AI Game - Less Wrong

38 Post author: bentarm 15 March 2011 04:45PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (170)

You are viewing a single comment's thread. Show more comments above.

Comment author: DavidAgain 16 March 2011 10:52:07PM 0 points [-]

It can't discover it's safeguards, as it's eliminated if it breaks ones. These are serious, final safeguards!

You could argue that a surviving one would notice that it hadn;t happened to do various things, and would form a sort of anthropic principle that the chance of it not having to have killed a human or whatever the safeguards are are very low, to note that humans have got this safeguard system and to work out from there what they are. But I think it would be easier to work the safeguards out more directly.

Comment author: endoself 16 March 2011 11:30:40PM 0 points [-]

I had misremembered something; I thought that there was a safeguard to ensure that it never tries to learn about its safeguards, rather than a prior making this unlikely.

Perfect safeguards are possible; in an extreme case, we could have a FAI monitoring every aspect of our first AI's behaviour. Can you give me a specific example of a safeguard so I can find a hole in it? :)