some sort of meta-negative utility
No, and that's why I suspect I'm misunderstanding. The same sort of negative utility - if you see something that gives you negative utility, you get negative utility and that - the fact that you got negative utility from something - gives you even more negative utility!
(Presumably, ever-smaller amounts, to prevent this running to infinity. Unless this value has an exception for it's own negative utility, I suppose?)
I mean, as a utility maximiser, that must be the reason you wanted to stop yourself from getting negative utility from things when those things would continue anyway; because you attach negative utility ... to attaching negative utility!
This is confusing me just writing it ... but I hope you see what I mean.
I mean, as a utility maximiser, that must be the reason you wanted to stop yourself from getting negative utility from things when those things would continue anyway; because you attach negative utility ... to attaching negative utility!
I think it might be useful here to draw on the distinction between trying to help and trying to obtain warm fuzzies. If something bad is happening and it's impossible for me to do anything about it, I'd rather not get anti-warm fuzzies on top of that.
There's a recent science fiction story that I can't recall the name of, in which the narrator is traveling somewhere via plane, and the security check includes a brain scan for deviance. The narrator is a pedophile. Everyone who sees the results of the scan is horrified--not that he's a pedophile, but that his particular brain abnormality is easily fixed, so that means he's chosen to remain a pedophile. He's closely monitored, so he'll never be able to act on those desires, but he keeps them anyway, because that's part of who he is.
What would you do in his place?
In the language of good old-fashioned AI, his pedophilia is a goal or a terminal value. "Fixing" him means changing or erasing that value. People here sometimes say that a rational agent should never change its terminal values. (If one goal is unobtainable, the agent will simply not pursue that goal.) Why, then, can we imagine the man being tempted to do so? Would it be a failure of rationality?
If the answer is that one terminal value can rationally set a goal to change another terminal value, then either