Wei_Dai comments on A definition of wireheading - Less Wrong

35 Post author: Anja 27 November 2012 07:31PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (80)

You are viewing a single comment's thread.

Comment author: Wei_Dai 30 November 2012 11:15:26PM 2 points [-]

Definition: We call an agent wireheaded if it systematically exploits some discrepancy between its true utility calculated w.r.t reality and its substitute utility calculated w.r.t. its model of reality. We say an agent wireheads itself if it (deliberately) creates or searches for such discrepancies.

What do you mean by "true utility"? In the case of an AI, we can perhaps reference the designer's intentions, but what about creatures that are not designed? Or things like neuromorphic AIs that are designed but do not have explicit hand-coded utility functions? A neuromorphic AI could probably do things that we'd intuitively call wireheading, but it's hard to see how to apply this definition.