drnickbone comments on Proper value learning through indifference - Less Wrong

16 Post author: Stuart_Armstrong 19 June 2014 09:39AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (50)

You are viewing a single comment's thread.

Comment author: drnickbone 07 July 2014 09:10:54PM 1 point [-]

This all looks clever, apart from the fact that the AI becomes completely indifferent to arbitrary changes in its value system. The way you describe it, the AI will happily and uncomplainingly accept a switch from a friendly v (such as promoting human survival, welfare and settlement of Galaxy) to an almost arbitrary w (such as making paperclips), just by pushing the right "update" buttons. An immediate worry is about who will be in charge of the update routine, and what happens if they are corrupt or make a mistake: if the AI is friendly, then it had better worry about this as well.

Interestingly, the examples you started with suggested that the AI should be rewarded somehow in its current utility v as a compensation for accepting a change to a different utility w. That does sound more natural, and more stable against rogue updates.