V_V comments on Introducing Corrigibility (an FAI research subfield) - LessWrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (28)
Section 4.1 frames the problem in terms of the agent creating a sub-agent or successor. My point is that the issue is more general, as there are manipulative actions that don't involve creating other agents.
Theorem 6 seems to address the general case, although I would remark that even if epsilon == 0 (that is, even UN is indifferent to manipulation) you aren't safe.