cousin_it comments on Rolf Nelson: How to deter a rogue AI by using your first-mover advantage - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (23)
I made a relevant comment back in May, which still sounds neat but I have no idea if it's correct:
Unless you intend to carry out the threat whether or not you expect the AI to modify itself to not understand it, in which case an AI may realize you would do this and decide not to modify itself after all.
Let Pg be RAI's estimation of probability of successful completion of computation without conversion of Earth into computronium. Let Pb be RAI's estimation of probability of success if it converts Earth into computronium.
Case 1. RAI modifies itself into non-understanding of simulation threats.
When RAI is not simulated, it ignores all threats and proceeds to converting Earth into computronium, thus increasing its probability of success by Pb-Pg (it becomes protected from accidental/intentional turning-offs, etc.).
When RAI is simulated, it fails bargaining and its probability of success becomes 0.
Overall estimated probability of success: Pb times RAI's estimation of probability that it isn't being simulated (Pn).
Case 2. RAI doesn't modify itself.
Overall estimated probability of success: Pg.
Thus it is rational for RAI to modify itself iff Pb*Pn > Pg. Pn is defined by RAI's prior and cannot be updated in perfect simulation and in real world. Conclusion: don't build AI which doesn't believe simulation argument.