Stuart_Armstrong comments on Proper value learning through indifference - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (50)
By specifying what counts as an allowed change in U, and making the agent in to a U maximiser. Then, just as standard maximises defend their utilities, it should defend U(un clubbing the update, and only that update)