lukeprog comments on Reply to Holden on 'Tool AI' - Less Wrong

94 Post author: Eliezer_Yudkowsky 12 June 2012 06:00PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (348)

You are viewing a single comment's thread. Show more comments above.

Comment author: lukeprog 01 May 2013 06:01:35PM *  4 points [-]

I don't know whether Hutter ever told Eliezer that "AIXI would kill off its users and seize control of its reward button," but he does say the following in his book (pp. 238-239):

Another problem connected, but possibly not limited to embodied agents, especially if they are rewarded by humans, is the following: Sufficiently intelligent agents may increase their rewards by psychologically manipulating their human "teachers", or by threatening them... Every intelligence superior to humans is capable of manipulating the latter. In the absence of manipulable humans, e.g. where the reward structure serves a survival function, AIXI may directly hack into its reward feedback. Since this is unlikely to increase its long-term survival, AIXI will probably resist this kind of manipulation (just as most humans don't take hard drugs, because of their long-term catastrophic consequences).

This issue is discussed at greater length, and with greater formality, in Dewey (2011) and Ring & Orseau (2011).