timtyler comments on AI indifference through utility manipulation - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (53)
After backtracking - to try and understand what it is that you think we are talking about - I think I can see what is going on here.
When you wrote:
...you were using "utility" as abbreviation for "utility function"!
That would result in a changing utility function, and - in that context - your comments make sense.
However, that represents a simple implementation mistake. You don't implement indifference by using a constantly-changing utility function. What changes - in order to make the utility of being switched off track the utility of being switched on - is just the utility associated with being switched off.
The utility function just has a component which says: "the expected utility of being stopped is the same as if not stopped". The utility function always says that - and doesn't change, regardless of sensory inputs or whether the stop button has been pressed.
What changes is the utility - not the utility function. That is what you wrote - but was apparently not what you meant - thus the confusion.
Yes, I apologise for the confusion. But what I showed in my post was that implementing "the expected utility of being stopped is the same as if not stopped" has to be done in a cunning way (the whole thing about histories having the same stem) or else extra information will get rid of indifference.