Problem: an overseer won’t see the AI which kills us all thinking about how to kill humans, not because the AI conceals that thought, but because the AI doesn’t think about how to kill humans in the first place. The AI just kills humans as a side effect of whatever else it’s doing.
Analogy: the Hawaii Chaff Flower didn’t go extinct because humans strategized to kill it. It went extinct because humans were building stuff nearby, and weren’t thinking about how to keep the flower alive. They probably weren’t thinking about the flower much at all.

More generally: how and why do humans drive species to extinction? In some cases the species is hunted to extinction, either because it's a threat or because it's economically profitable to hunt. But I would guess that in 99+% of cases, the humans drive a species to extinction because the humans are doing something that changes the species' environment a lot, without specifically trying to keep the species alive. DDT, deforestation, introduction of new predators/competitors/parasites, construction… that’s the sort of thing which I expect drives most extinction.
Assuming this metaphor carries over to AI (similar to the second species argument), what kind of extinction risk will AI pose?
Well, the extinction risk will not come from AI actively trying to kill the humans. The AI will just be doing some big thing which happens to involve changing the environment a lot (like making replicators, or dumping waste heat from computronium, or deciding that an oxygen-rich environment is just really inconvenient what with all the rusting and tarnishing and fires, or even just designing a fusion power generator), and then humans die as a side-effect. Collateral damage happens by default when something changes the environment in big ways.
What does this mean for oversight? Well, it means that there wouldn't necessarily be any point at which the AI is actually thinking about killing humans or whatever. It just doesn't think much about the humans at all, and then the humans get wrecked by side effects. In order for an overseer to raise an alarm, the overseer would have to figure out itself that the AI's plans will kill the humans, i.e. the overseer would have to itself predict the consequences of a presumably-very-complicated plan.
That's definitely my crux, for purposes of this argument. I think AGI will just be that much more powerful than humans. And I think the bar isn't even very high.
I think my intuition here mostly comes from pointing my inner sim at differences within the current human distribution. For instance, if I think about myself in a political policy conflict with a few dozen IQ-85-ish humans... I imagine the IQ-85-ish humans maybe manage to organize a small protest if they're unusually competent, but most of the time they just hold one or two meetings and then fail to actually do anything at all. Whereas my first move would be to go talk to someone in whatever bureacratic position is most relevant about how they operate day-to-day, read up on the relevant laws and organizational structures, identify the one or two people who I actually need to convince, and then meet with them. Even if the IQ-85 group manages their best-case outcome (i.e. organize a small protest), I probably just completely ignore them because the one or two bureaucrats I actually need to convince are also not paying any attention to their small protest (which probably isn't even in a place where the actually-relevant bureaucrats would see it, because the IQ-85-ish humans have no idea who the relevant bureaucrats are).
And those IQ-85-ish humans do seem like a pretty good analogy for humanity right now with respect to AGI. Most of the time the humans just fail to do anything effective at all about the AGI; the AGI has little reason to pay attention to them.