Computers don't have any sense of aesthetics or patterns that are standard the way people learn how to play chess. They play what they think is the objectively best move in any position, even if it looks absurd, and they can play any move no matter how ugly it is." - Murray Campbell quote about DeepBlue
Vinge's principle states: "we usually think we can't predict exactly what a smarter-than-us agent will do, because if we could predict that, we would be that smart ourselves".
A popular idea think this means that AGI would invent and use new technology such as nanorobotics to defeat us (this is the example Yudkowsky usually gives).
However, this doesn't seem to jive with what happens in other domains where AI becomes superhuman. Usually what the AI does is understandable to humans. It's just that it looks, well, dumb.
For example, in chess, computers use roughly the same piece evaluation that humans discovered in the 18th century, didn't discover any new openings, and generally seemed to play ugly moves. But they won anyways.
If something like nanorobotics lets you take over the world, you'd expect a human group to be trying to create them to take over the world already because it seems to make sense. In reality, any plan that (for example) relies on DNA as a stepping stone will quickly run into regulatory problems.
Instead, I imagine that the AGI's plan will elicit similar reactions as the following:
You get a phone call advertising free Mayonnaise! You just need to follow a couple simple steps. The next day, you're confused and in some sort of Mayonnaise cult breaking into a military armory in Mexico.
Is this plan something that humans can try? No, it seems pretty straight forward to attempt. So why haven't we tried it? Because it seems and likely is dumb. Why mayonnaise? Why a phone call? Why Mexico?
But if AGI is similar to other superhuman AI, this is the type of thing we expect to see; a strategy that looks dumb but works. We have no way to predict which dumb strategy will be used, but given the large number of strategies that look dumb to humans, the AGI's strategy is likely to be one of them. And it has enough Yomi to predict which one will succeed.
As I think more about this, the LLM as a collaborator alone might have a major impact. Just off the top of my head, a kind of Rube Goldberg attack might be <redacted for info hazard>. Thinking about it in one's isolated mind, someone might never consider carrying something like that out. Again, I am trying to model the type of person who carries out a real attack, and I don't estimate that person having above-average levels of self confidence. I suspect the default is to doubt themselves enough to avoid acting in the same way most people do about their entreprenurial ideas.
However, if they either presented it to an LLM for refinement, or if the LLM suggested it, there could be just enough psychological boost of validity to push them over the edge to trying it. And after a few successes on the news of either "dumb" or "bizarre" or "innovative" attacks being successful due to "AI telling these people how to do it" then the effect might get even stronger.
To my knowledge, one could have bought an AR-15 since the mid to late 1970s. My cousin has a Colt from 1981 he bought when he was 19. Yet people weren't mass shooting each other, even during times when the overall crime/murder rate was higher than it is now. Some confluence of factors has driven the surge, one of them probably being a strong meme, "Oh, this actually tends to '''work.''" Basically, a type of social proofing of efficacy.
And I am willing to bet $100 that the media will report big on the first few cases of "Weird Attacks Designed by AI."
It seems obvious to me that the biggest problems in alignment are going to be the humans, both long before the robots, and probably long after.