Stuart_Armstrong comments on Trapping AIs via utility indifference - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (32)
Ah, but of course :-)
I like your k idea, but my more complicated setup is more robust to most situations where the AI is capable of modifying k (it fails in situations that are essentially "I will reward you for modifying k").
But is this not a general objection to AI utility functions? If it has a false belief, it can store its utility in a compressed form, that then turns out to be not equivalent. It seems we would simply want the AI not to compress its utility function in ways that might be detrimental.