abramdemski comments on Q&A with Abram Demski on risks from AI - Less Wrong

22 Post author: XiXiDu 17 January 2012 09:43AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (70)

You are viewing a single comment's thread. Show more comments above.

Comment author: abramdemski 04 March 2012 12:26:58AM 3 points [-]

Steve,

The idea here is that if an agent is able to (literally or effectively) modify its goal structure, and grows up in an environment in which humans deprive it of what it wants when it behaves badly, an effective strategy for getting what it wants more often will be to alter its goal structure to be closer to the humans. This is only realistic with some architectures. One requirement here is that the cognitive load of keeping track of the human goals and potential human punishments is a difficulty for the early-stage system, such that it would be better off altering its own goal system. Similarly, it must be assumed that during the period of its socialization, it is not advanced enough to effectively hide its feelings. These are significant assumptions.

Comment author: Wei_Dai 04 March 2012 08:07:08PM *  5 points [-]

Interesting! Have you written about this idea in more detail elsewhere? Here are my concerns about it:

  1. The AI has to infer the human's goals. Given the assumed/required cognitive limitations, it may not do a particularly good job of this.
  2. What if the human doesn't fully understand his or her own goals? What does the AI do in that situation?
  3. The AI could do something like plant a hidden time-bomb in its own code, so that its goal system reverts from the post-modification "close to humans" back to its original goals at some future time when it's no longer punishable by humans.

Given these problems and the various requirements on the AI for it to be successfully socialized, I don't understand why you assign only 0.1 probability to the AI not being socialized.