So8res comments on Introducing Corrigibility (an FAI research subfield) - LessWrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (28)
Yep! In fact, this is exactly the problem discussed in section 4.1 and described in Theorem 6, is it not?
Section 4.1 frames the problem in terms of the agent creating a sub-agent or successor. My point is that the issue is more general, as there are manipulative actions that don't involve creating other agents.
Theorem 6 seems to address the general case, although I would remark that even if epsilon == 0 (that is, even UN is indifferent to manipulation) you aren't safe.