You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Stuart_Armstrong comments on Trapping AIs via utility indifference - Less Wrong Discussion

3 Post author: Stuart_Armstrong 28 February 2012 07:27PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (32)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 02 March 2012 10:41:09AM 0 points [-]

This strikes me as a plausible problem and a good solution. I reward you, as is traditional, with a nitpicky question.

Ah, but of course :-)

I like your k idea, but my more complicated setup is more robust to most situations where the AI is capable of modifying k (it fails in situations that are essentially "I will reward you for modifying k").

Therefore there is a danger under self-improvement. Consider a seed AI with your indifferent utility function that believes with certainty that no iteration of it can influence X, a binary quantum event. It has no reason to bother conserving its indifference concerning X, since it anticipates behaving identically if its U'(X=1) = 0. Since that's simpler than a normalized function, it adopts it. Then several iterations down the line, it begins to suspect that, just maybe, it can influence quantum events, so it converts the universe into a quantum-event-influencer device.

But is this not a general objection to AI utility functions? If it has a false belief, it can store its utility in a compressed form, that then turns out to be not equivalent. It seems we would simply want the AI not to compress its utility function in ways that might be detrimental.