JoshuaFox comments on A definition of wireheading - Less Wrong

35 Post author: Anja 27 November 2012 07:31PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (80)

You are viewing a single comment's thread. Show more comments above.

Comment author: timtyler 28 November 2012 11:47:52PM *  5 points [-]

My 2011 "Utility counterfeiting" essay categorises the area a little differently:

It has "utility counterfeiting" as the umbrella category - and "the wirehead problem" and "the pornography problem" as sub-categories.

In this categorisation scheme, the wirehead problem involves getting utility directly - while the ponography problem involves getting utility by manipulating sensory inputs. This corresponds to Nozick's experience machine, or Ring and Orseau's delusion box.

Calling the umbrella category "wireheading" leaves you with the problem of what to call these subcategories.

Comment author: JoshuaFox 02 December 2012 02:47:10PM *  0 points [-]

Anja, nice post.

@timtyler, you have made a nice point about taxonomy --- also noting your comment re Hibbard below.

I suggest classifying like this:

  • Agents that maximize a utility register, a memory location that can be hijacked (as Utilitron; something similar happened with Eurisko).

  • Agents that maximize an internally-calculated utility function of either input (observations) or of world-model. Agents that maximize a function of the input stream can hijack that input stream or any point in the pipeline of calculations that produces this number. Drugs and electrical wireheading relate to this.

  • Agents that maximize a reward provided from the outside, whether from the creator or the the environment at large. The reward function may be unknown to the agent. These agents can hijack the reward stream.

All these are distinct from:

  • Wireheading in humans, which as Eliezer points out, results from different desires of different mental parts.

  • Paperclippers, which could naively be seen as wireheading if we falsely liken its simplistic behavior to a human who is satisfying a simple pleasure sensation as opposed to a more complex value system: "Why are you going wild with stimulating your cravings for making paperclips, like humans who overeat, rather than considering more deeply what would be the right thing do?"