This is an idea that has been raised before. There are a variety of difficulties with it: 1) I can't precommit to simulating every single possible RAI (there are lots of things an RAI might want to calculate.) 2) Many unFriendly AIs will have goals that are just unobtainable if they are in a simulation. For example, a paperclip maximizer might not see paperclips made in a simulation as actual paperclips. Thus it will reason "either I'm not in a simulation so I will be fine destroying humans to make paperclips or I'm in a simulation in which case nothing I do is likely to alter the number of paperclips at all.) 3) This assumes that highly accurate simulations of reality can occur with not much resources. If that's not the case then this fails.
Edit: Curious for reason for downvote.
1) I can't precommit to simulating every single possible RAI (there are lots of things an RAI might want to calculate.)
It's not necessary to do so, just to simulate enough randomly drawn ones (from an approximation of the distribution of UFAIs that might have been created) that any particular deterrable UFAI assigns sufficient subjective probability to being in a simulation.
...2) Many unFriendly AIs will have goals that are just unobtainable if they are in a simulation. For example, a paperclip maximizer might not see paperclips made in a simulation as a
http://www.sl4.org/archive/0708/16600.html