drethelin comments on Open Thread, October 16-31, 2012 - Less Wrong

5 Post author: OpenThreadGuy 16 October 2012 10:43PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (271)

You are viewing a single comment's thread. Show more comments above.

Comment author: ciphergoth 17 October 2012 09:56:18PM 3 points [-]

Is there a word for a person, or an agent, that self-modifies to find something more painful, in order to change someone else's incentives, as described here? Obviously there are some choice phrases we might like to use about such a person, but most of them - eg "moral blackmail" - seem insufficiently precise. Is there a term that captures specifically this, and not other behaviour we don't like? If not, what might be a good, specific term?

Comment author: drethelin 17 October 2012 10:19:59PM 1 point [-]

Subset of utility monster, I think.

Comment author: ciphergoth 17 October 2012 10:22:43PM 0 points [-]

I have used that term for this, but it's not very precise: the Wikipedia entry has the monster absorbing positive utility rather than threatening negative, and there's no mention of self-modification.

Comment author: wgd 19 October 2012 06:41:32PM *  0 points [-]

The self-modification isn't in itself the issue though is it? It seems to me that just about any sort of agent would be willing to self-modify into a utility monster if it had an expectation of that strategy being more likely to achieve its goals, and the pleasure/pain distinction is simply adding a constant (negative) offset to all utilities (which is meaningless since utility functions are generally assumed to be invariant under affine transformations).

I don't even think it's a subset of utility monster, it's just a straight up "agent deciding to become a utility monster because that furthers its goals".