Problem: an overseer won’t see the AI which kills us all thinking about how to kill humans, not because the AI conceals that thought, but because the AI doesn’t think about how to kill humans in the first place. The AI just kills humans as a side effect of whatever else it’s doing.
Analogy: the Hawaii Chaff Flower didn’t go extinct because humans strategized to kill it. It went extinct because humans were building stuff nearby, and weren’t thinking about how to keep the flower alive. They probably weren’t thinking about the flower much at all.

More generally: how and why do humans drive species to extinction? In some cases the species is hunted to extinction, either because it's a threat or because it's economically profitable to hunt. But I would guess that in 99+% of cases, the humans drive a species to extinction because the humans are doing something that changes the species' environment a lot, without specifically trying to keep the species alive. DDT, deforestation, introduction of new predators/competitors/parasites, construction… that’s the sort of thing which I expect drives most extinction.
Assuming this metaphor carries over to AI (similar to the second species argument), what kind of extinction risk will AI pose?
Well, the extinction risk will not come from AI actively trying to kill the humans. The AI will just be doing some big thing which happens to involve changing the environment a lot (like making replicators, or dumping waste heat from computronium, or deciding that an oxygen-rich environment is just really inconvenient what with all the rusting and tarnishing and fires, or even just designing a fusion power generator), and then humans die as a side-effect. Collateral damage happens by default when something changes the environment in big ways.
What does this mean for oversight? Well, it means that there wouldn't necessarily be any point at which the AI is actually thinking about killing humans or whatever. It just doesn't think much about the humans at all, and then the humans get wrecked by side effects. In order for an overseer to raise an alarm, the overseer would have to figure out itself that the AI's plans will kill the humans, i.e. the overseer would have to itself predict the consequences of a presumably-very-complicated plan.
I would presume that the AI would know that humans are likely to try to resist a takeover attempt, and to have various safeguards against it. It might be smart enough to be able to overcome any human response, but that seems to only work if it actually puts that intelligence to work by thinking about what (if anything) it needs to do to counteract the human response.
More generally, humans are such a major influence on the world as well as a source of potential resources, that it would seem really odd for any superintelligence to naturally arrive on a world-takeover plan without at any point happening to consider how this will affect humanity and whether that suggests any changes to the plan.
That assumes humans are, in fact, likely to meaningfully resist a takeover attempt. My guess is that humans are not likely to meaningfully resist a takeover attempt, and the AI will (implicitly) know that.
I mean, if the AI tries to change who's at the top of society's status hierarchy (e.g. the President), then sure, the humans will freak out. But what does an AI care about the status hierarchy? It's not like being at... (read more)