It's clear that in such a system, the 'weak point' would be the people in control of the private key.
If the AI is out of the box, I don't think humans are the weak point.

Humans physically do something when they reward the AI. To get a reward, the AI has only to figure out what the humans would physically do and mimic that itself. If the human reward the AI by pressing a big red button, then the AI can just kill the human and press the big red button itself. It wouldn't matter if the big red button uses 512 bit elliptic curve cryptography -- the AI just has to find a paperweight and put it on the button.
If humans can perform RSA encryption#Encryption) silently in their heads, then you might be on to something. A human could memorize a private key and produce a cryptographically signed reward for the AI when the human deemed the AI worthy. The AI would not know the private key, would not be able to produce signed rewards, and would not be able to mimic humans. This setup works because it is cryptographically difficult to mimic a human doing RSA. But if the human did not perform the cryptography silently in their head, it would not be cryptographically difficult to mimic their rewarding behavior.
But I doubt that humans could perform RSA or elliptic curve cryptography mentally. Unless humans can compute trapdoor functions silently in their head, I don't see how public key cryptography could buy you anything over a reward-button.
For the same reason that Linus Torvalds is not able to hack every single Linux system in existence, it is reasonable to assume that the probability of 'backdoor'-type attacks could be reduced or eliminated.
If you're not talking about a back door, then I'm not sure what you're trying to say here. Are you implying that the AI will find the 'reward producing machine' and somehow use it to produce rewards? It doesn't work that way because the machine would still need the key to produce a cryptographically-valid reward, and the key would not be stored on the ma...
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.