Vaniver comments on Open Thread, Feb 8 - Feb 15, 2016 - Less Wrong

4 Post author: Elo 08 February 2016 04:47AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (215)

You are viewing a single comment's thread. Show more comments above.

Comment author: Vaniver 09 February 2016 12:40:15PM 1 point [-]

However, wouldn't it research further and correct itself (and before that, have care to not do something un-correctable)?

Check out the Cake or Death value loading problem, as Stuart Armstrong puts it.

There's a rough similarity to the 'resist blackmail' problem, which is that you need to be able to tell the difference between someone delivering bad news and doing bad things. If the AI is mistaken about what is right, we want to be able to correct it without being interpreted as villains out to destroy potential utility.

(Also, "correctable" is not really a low-level separation in reality, since the passage of time means nothing is truly correctable.)