shminux comments on Trapping AIs via utility indifference - Less Wrong

3 Post author: Stuart_Armstrong 28 February 2012 07:27PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (32)

You are viewing a single comment's thread.

Comment author: shminux 28 February 2012 08:55:12PM *  0 points [-]

What do you think an AI that has read your article would do to avoid being trapped, given that such a trap (and the resulting program termination) would most certainly interfere with its utility function, no matter what it is?

In other words, are you sure that it is possible to have a utility indifference for a fully invested non-linear utility (not just for the first approximation an AI receives before it figures out that its self-preservation is an absolutely essential part of maximizing any given utility)?

Comment author: moridinamael 29 February 2012 02:24:39PM 0 points [-]

If you learned that the only reason you love your parents is nothing more than gross biological manipulation, do you react by ceasing to love you parents?

Comment author: [deleted] 29 February 2012 04:02:55PM 2 points [-]

If you learned that the only reason you love your parents is nothing more than gross biological manipulation, do you react by ceasing to love you parents?

If you realize that loving your parents causes net disutility to you, and you have the ability to self-hack or change your code, then....yes.

Comment author: shminux 01 March 2012 05:13:57AM 0 points [-]

Children of abusers and narcissists put quite an effort into doing just that.

Comment author: Stuart_Armstrong 29 February 2012 10:17:49AM *  0 points [-]

What do you think an AI that has read your article would do to avoid being trapped, given that such a trap (and the resulting program termination) would most certainly interfere with its utility function, no matter what it is?

Nothing at all. The trap works even if the AI knows everything there is to know, precisely because after utility indifference, its behaviour is exactly compatible with its utility function. It behaves "as if" it had utility function U and a false belief, but in reality it has utility function V and true beliefs.