Vaniver comments on Open Thread, Feb 8 - Feb 15, 2016 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (215)
Check out the Cake or Death value loading problem, as Stuart Armstrong puts it.
There's a rough similarity to the 'resist blackmail' problem, which is that you need to be able to tell the difference between someone delivering bad news and doing bad things. If the AI is mistaken about what is right, we want to be able to correct it without being interpreted as villains out to destroy potential utility.
(Also, "correctable" is not really a low-level separation in reality, since the passage of time means nothing is truly correctable.)