Stuart_Armstrong comments on Proper value learning through indifference - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (50)
I will only allow v to change if that change will trigger the "U adaptation" (the adding and subtracting of constants). You have to specify what processes count as U adaptations (certain types of conversations with certain people, eg) and what doesn't.
Oh, I see. So the AI simply losing the memory that v was stored in and replacing it with random noise shoudn't count as something it will be indifferent about? How would you formalize this such that arbitrary changes to v don't trigger the indifference?
By specifying what counts as an allowed change in U, and making the agent in to a U maximiser. Then, just as standard maximises defend their utilities, it should defend U(un clubbing the update, and only that update)