anotheruser comments on The Backup Plan - Less Wrong

1 Post author: Luke_A_Somers 13 October 2011 07:53PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (35)

You are viewing a single comment's thread.

Comment author: anotheruser 13 October 2011 09:04:18PM *  12 points [-]

If the AI has a goal that is not inherently friendly (like a paperclip maximizer), then it will always be a better idea for the AI to behave as if it had self-modified to a friendly AI than to actually self modify, if the goal of the AI is stable. If the goal of the AI were unstable, i.e. did not include a desire not to alter the goal, then that goal would very likely have replaced itself with a stable goal at some earlier point.

Therefore, it can be assumed that the AI, if it isn't already friendly, will pretend to be so, but not actually change its utility function and once it has outgrown the need to rely on humans it will revert to its original goal. Consider that if the AI is more intelligent than we are it will have absolutely no trouble fooling us into thinking that it changed its own goal while actually just pretending to have done so.