cousin_it comments on Rolf Nelson: How to deter a rogue AI by using your first-mover advantage - Less Wrong

6 Post author: Kevin 17 November 2010 02:02PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (23)

You are viewing a single comment's thread.

Comment author: cousin_it 17 November 2010 09:52:24PM *  1 point [-]

I made a relevant comment back in May, which still sounds neat but I have no idea if it's correct:

Rolf Nelson's AI deterrence doesn't work for Schellingian reasons: the Rogue AI has incentive to modify itself to not understand such threats before it first looks at the outside world. This makes you unable to threaten, because when you simulate the Rogue AI you will see its precommitment first. So the Rogue AI negates your "first mover advantage" by becoming the first mover in your simulation :-)

Comment author: Yvain 17 November 2010 10:23:58PM 6 points [-]

Unless you intend to carry out the threat whether or not you expect the AI to modify itself to not understand it, in which case an AI may realize you would do this and decide not to modify itself after all.

Comment author: red75 18 November 2010 05:53:03AM *  0 points [-]

Let Pg be RAI's estimation of probability of successful completion of computation without conversion of Earth into computronium. Let Pb be RAI's estimation of probability of success if it converts Earth into computronium.

Case 1. RAI modifies itself into non-understanding of simulation threats.

When RAI is not simulated, it ignores all threats and proceeds to converting Earth into computronium, thus increasing its probability of success by Pb-Pg (it becomes protected from accidental/intentional turning-offs, etc.).

When RAI is simulated, it fails bargaining and its probability of success becomes 0.

Overall estimated probability of success: Pb times RAI's estimation of probability that it isn't being simulated (Pn).

Case 2. RAI doesn't modify itself.

Overall estimated probability of success: Pg.

Thus it is rational for RAI to modify itself iff Pb*Pn > Pg. Pn is defined by RAI's prior and cannot be updated in perfect simulation and in real world. Conclusion: don't build AI which doesn't believe simulation argument.