The AI will never have any active desire to push the detonator (barring other reasons like someone saying "push the detonator, and I will donate $1 to the AI).
And this will not stop the AI from defecting, not at all. It will, however, ensure that while defecting, the detonator will not be a priority - it's effectively just an inert lump from the AI's persepective. So the AI will try and grab the nuclear missiles, or hack the president, or whatever, but the guy in the shed by the explosives is low down on the list. Maybe low down enough that they'll be able to react on time.
The AI will never have any active desire to push the detonator (barring other reasons like someone saying "push the detonator, and I will donate $1 to the AI).
To reiterate what I said, if defection is of positive value to the AI and pushing the detonator is == defection as it seems to be, then pushing the detonator is of equal positive value.
I just noticed that LessWrong has not yet linked to FHI researcher Stuart Amstrong's brief technical report, Utility Indifference (2010). It opens: