drethelin comments on Open Thread, October 16-31, 2012 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (271)
Is there a word for a person, or an agent, that self-modifies to find something more painful, in order to change someone else's incentives, as described here? Obviously there are some choice phrases we might like to use about such a person, but most of them - eg "moral blackmail" - seem insufficiently precise. Is there a term that captures specifically this, and not other behaviour we don't like? If not, what might be a good, specific term?
Subset of utility monster, I think.
I have used that term for this, but it's not very precise: the Wikipedia entry has the monster absorbing positive utility rather than threatening negative, and there's no mention of self-modification.
The self-modification isn't in itself the issue though is it? It seems to me that just about any sort of agent would be willing to self-modify into a utility monster if it had an expectation of that strategy being more likely to achieve its goals, and the pleasure/pain distinction is simply adding a constant (negative) offset to all utilities (which is meaningless since utility functions are generally assumed to be invariant under affine transformations).
I don't even think it's a subset of utility monster, it's just a straight up "agent deciding to become a utility monster because that furthers its goals".