You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

KatjaGrace comments on Superintelligence 20: The value-loading problem - Less Wrong Discussion

4 Post author: KatjaGrace 27 January 2015 02:00AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (21)

You are viewing a single comment's thread.

Comment author: KatjaGrace 26 January 2015 07:59:24PM 2 points [-]

Bostrom says a human doesn't try to disable its own goal accretion (though that process alters its values) in part because it is not well described as a utility maximizer (p190, footnote 11). Why assume AI will be so much better described as a utility maximizer that this characteristic will cease to hold?

Comment author: William_S 09 February 2015 07:00:40PM *  0 points [-]

I can think of a few reasons why it might seem like humans don't try to disable goal accretion:

*Humans can't easily perform reliable self-modifications, and as a result usually don't consider things like disabling goal accretion as something that's possible.

*When a human believes something strongly enough to want to try to fix it as a goal, mechanisms kick in to hold it in place that don't involve consciously considering value accretion disabling as a goal. For example, confirmation bias and other cognitive biases, making costly commitments to join a group of people who also share that goal (which makes it harder to take it away, ect.).

*Cognitive biases lead us to underestimate the amount values have shifted in the past, and wildly underestimate how our values might shift in the future

*Humans believe that all value accretion is good, because it lead to the present set of values, and they are good and right. Also, humans believe that their values will not change in the future, because they feel objectively good and right (subjectively objective).

*Our final goals are inaccessible, so we don't really know what it is we would want to fix as our goals.

*Our actual final goals (if there is something like that that can be meaningfully specified) include keeping the goal accretion mechanism running.

It seems likely that an AI system which humans understand well enough to design might have fewer of these properties.