jimrandomh comments on Introducing Corrigibility (an FAI research subfield) - LessWrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (28)
Yes. It's not a sure-fire safeguard and it doesn't work against all UFAIs, but if done correctly, you can think of corrigibility as granting a saving throw. But note that while this paper is a huge step forward, "how to do corrigibility correctly" is not nearly a solved problem yet.
(Corrigibility was a topic at the second MIRIx Boston workshop, and we have results that build on this paper which we are working on writing up.)