JoshuaZ comments on Closest stable alternative preferences - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (1)
I know you aren't trying to list all caveats but I think there are others that are other important ways this can go wrong. An agent may not be able to tell that a self-modification will be successful but it may have a high expected utility even as there's some risk of changing one's preferences.