So8res comments on Introducing Corrigibility (an FAI research subfield) - LessWrong

29 Post author: So8res 20 October 2014 09:09PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (28)

You are viewing a single comment's thread. Show more comments above.

Comment author: So8res 25 October 2014 04:30:22PM 0 points [-]

E.g. assume that the users (the programmers) would use a remote controlled robotic arm to press the shutdown button. If the agents turns out to be a paperclipper, it may disassemble the robotic arm just to turn it into paperclips. The agent is not "intentionally" trying to resist shutdown, but the effect will be the same. Symmetrically there could be scenarios where the agent "accidentally" presses the shutdown button itself.

Yep! In fact, this is exactly the problem discussed in section 4.1 and described in Theorem 6, is it not?

Comment author: V_V 25 October 2014 05:52:27PM 0 points [-]

Section 4.1 frames the problem in terms of the agent creating a sub-agent or successor. My point is that the issue is more general, as there are manipulative actions that don't involve creating other agents.
Theorem 6 seems to address the general case, although I would remark that even if epsilon == 0 (that is, even UN is indifferent to manipulation) you aren't safe.