Giles comments on Why No Wireheading? - Less Wrong

16 [deleted] 18 June 2011 11:33PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (112)

You are viewing a single comment's thread. Show more comments above.

Comment author: Giles 19 June 2011 04:12:59PM 2 points [-]

This is just my opinion, not particularly evidence-based: I don't think that there are two different kinds of mind, or if there are it's not this issue that separates them. The wireheading scenario is one which is very alien to our ancestral environment so we may not have an "instinctive" preference for or against it. Rather, we have to extrapolate that preference from other things.

Two heuristics which might be relevant:

  • where "wanting" and "liking" conflict, it feels like "wanting" is broken (i.e. we're making ourselves do things we don't enjoy). So given the opportunity we might want to update what we "want". This is pro-wireheading.
  • where we feel we are being manipulated, we want to fight that manipulation in case it's against our own interests. Thinking about brain probes is a sort of manipulation-superstimulus, so this heuristic would be anti-wireheading.

I can very well believe that wireheading correlates with personality type, which is a weak form of your "two different minds" hypothesis.

Sorry for the ultra-speculative nature of this post.

Comment author: [deleted] 20 June 2011 06:08:22PM 0 points [-]

Makes sense in terms of explaining the different intuition, yes, and is essentially how I think about it.

The second heuristic about manipulation, then, seems useful in practice (more agents will try to exploit us than satisfy us), but isn't it much weaker, considering the actual wireheading scenario? The first heuristic actually addresses the conflict (although maybe the wrong way), but the second just ignores it.

Comment author: Giles 21 June 2011 01:41:26AM 0 points [-]

I agree; the second heuristic doesn't apply particularly well to this scenario. Some terminal values seem to come from a part of the brain which isn't open to introspection, so I'd expect them to arise as a result of evolutionary kludges and random cultural influences rather than necessarily making any logical sense.

The thing is, once we have a value system that's reasonably stable (i.e. what we want is the same as what we want to want) then we don't want to change our preferences even if we can't explain where they arise from.