Stuart_Armstrong comments on Reply to Holden on 'Tool AI' - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (348)
The AI gets nothing from you deciding to kill it. What utility indifference means is that it will get the same utility if you decide to kill it, and the explosives go off, as if you decided to kill it, and the explosives failed to go off.
So if it expects to cure cancer with 80% likelihood if it survives, and thus get 8 utilitons, then if if dies from the explosion, it will also get 8 utilitons.
More precisely there are six worlds, depending on a) whether we trigger the explosives or not, b) whether it survives the explosion or not, and c) whether cancer is curable by an unmolested AI. In obvious notation, these are TSC, TS¬C, T¬SC, T¬S¬C, ¬TC, ¬T¬C. We are assuming in your example that U(¬T¬C)=U(TS¬C)=0 and U(¬TC)=U(TSC)=10. Utility indifference then sets U(T¬S¬C)=0 and U(T¬SC)=10.
Hence assuming that S is independent of C (the curability of cancer by an unmolested AI is independent of whether particular explosives go off), we see that the AI is indifferent as to whether the explosives go off - it gains nothing either way, so has no cause to influence the explosives in any direction. If we further assume T and C are independent (the curability of cancer by an unmolested AI is independent of whether we decide to blow up this particular AI), then the AI also has no preference as to whether we trigger the explosives or not - it gains nothing either way.
Ah, that makes sense. It isn't indifferent to suicide as such; it's only indifferent to your success at attempting to kill it, should you make the attempt.
Thanks for your patience!
No prob :-) Always happy when I manage to explain something successfully!