Problem: an overseer won’t see the AI which kills us all thinking about how to kill humans, not because the AI conceals that thought, but because the AI doesn’t think about how to kill humans in the first place. The AI just kills humans as a side effect of whatever else it’s doing.
Analogy: the Hawaii Chaff Flower didn’t go extinct because humans strategized to kill it. It went extinct because humans were building stuff nearby, and weren’t thinking about how to keep the flower alive. They probably weren’t thinking about the flower much at all.

More generally: how and why do humans drive species to extinction? In some cases the species is hunted to extinction, either because it's a threat or because it's economically profitable to hunt. But I would guess that in 99+% of cases, the humans drive a species to extinction because the humans are doing something that changes the species' environment a lot, without specifically trying to keep the species alive. DDT, deforestation, introduction of new predators/competitors/parasites, construction… that’s the sort of thing which I expect drives most extinction.
Assuming this metaphor carries over to AI (similar to the second species argument), what kind of extinction risk will AI pose?
Well, the extinction risk will not come from AI actively trying to kill the humans. The AI will just be doing some big thing which happens to involve changing the environment a lot (like making replicators, or dumping waste heat from computronium, or deciding that an oxygen-rich environment is just really inconvenient what with all the rusting and tarnishing and fires, or even just designing a fusion power generator), and then humans die as a side-effect. Collateral damage happens by default when something changes the environment in big ways.
What does this mean for oversight? Well, it means that there wouldn't necessarily be any point at which the AI is actually thinking about killing humans or whatever. It just doesn't think much about the humans at all, and then the humans get wrecked by side effects. In order for an overseer to raise an alarm, the overseer would have to figure out itself that the AI's plans will kill the humans, i.e. the overseer would have to itself predict the consequences of a presumably-very-complicated plan.
I would expect a superhuman AI to be really good at tracking the consequences of its actions. The AI isn't setting out to wipe out humanity. But in the list of side effects of removing all oxygen, along with many things no human would ever consider, is wiping out humanity.
AIXI tracks every consequence of its actions, at the quantum level. A physical AI must approximate, tracking only the most important consequences. So in its decision process, I would expect a smart AI to extensively track all consequences that might be important.
I don't think lazy data structures can pull this off. The AI must calculate various ways human extinction could affect its utility.
So unless there are some heuristics that are so general they cover this as a special case, and the AI can find them without considering the special cases first, then it must explicitly consider human extinction.