I can't claim to know any more than the links just before section IV here: https://slatestarcodex.com/2020/01/30/book-review-human-compatible/. It's viewed as maybe promising or part of the solution. There's a problem if the program erroneously thinks it knows the humans' preferences, or if it anticipates that it can learn the humans' preferences and produce a better action than the humans would otherwise take. Since "accept a shutdown command" is a last resort option, ideally it wouldn't depend on the program not thinking something erroneously. Yudkowsky proposed the second idea here https://arbital.com/p/updated_deference/, there's a discussion of that and other responses here https://mailchi.mp/59ddebcb3b9a/an-69-stuart-russells-new-book-on-why-we-need-to-replace-the-standard-model-of-ai. I don't know how the CIRL researchers respond to these challenges.
I can't claim to know any more than the links just before section IV here: https://slatestarcodex.com/2020/01/30/book-review-human-compatible/. It's viewed as maybe promising or part of the solution. There's a problem if the program erroneously thinks it knows the humans' preferences, or if it anticipates that it can learn the humans' preferences and produce a better action than the humans would otherwise take. Since "accept a shutdown command" is a last resort option, ideally it wouldn't depend on the program not thinking something erroneously. Yudkowsky proposed the second idea here https://arbital.com/p/updated_deference/, there's a discussion of that and other responses here https://mailchi.mp/59ddebcb3b9a/an-69-stuart-russells-new-book-on-why-we-need-to-replace-the-standard-model-of-ai. I don't know how the CIRL researchers respond to these challenges.