The concept is sound, I think. Take an extreme example, such as the Gandhi's pill thought experiment:
If you offered Gandhi a pill that made him want to kill people, he would refuse to take it, because he knows that then he would kill people, and the current Gandhi doesn’t want to kill people. This, roughly speaking, is an argument that minds sufficiently advanced to precisely modify and improve themselves, will tend to preserve the motivational framework they started in. (emphasis mine)
While it may be imperfect preservation while still stupid, or contain globular, fuzzy definitions of goals, an adequately powerful self improving AI should eventually reach a state of static, well defined goals permanently.
eventually reach a state of static, well defined goals permanently.
First, this is radically different from the claim that an AI has to forever stick with its original goals.
Second, that would be true only under the assumption of no new information becoming available to an AI, ever. Once we accept that goals mutate, I don't see how you can guarantee that some new information won't cause them to mutate again.
A stub on a point that's come up recently.
If I owned a paperclip factory, and casually told my foreman to improve efficiency while I'm away, and he planned a takeover of the country, aiming to devote its entire economy to paperclip manufacturing (apart from the armament factories he needed to invade neighbouring countries and steal their iron mines)... then I'd conclude that my foreman was an idiot (or being wilfully idiotic). He obviously had no idea what I meant. And if he misunderstood me so egregiously, he's certainly not a threat: he's unlikely to reason his way out of a paper bag, let alone to any position of power.
If I owned a paperclip factory, and casually programmed my superintelligent AI to improve efficiency while I'm away, and it planned a takeover of the country... then I can't conclude that the AI is an idiot. It is following its programming. Unlike a human that behaved the same way, it probably knows exactly what I meant to program in. It just doesn't care: it follows its programming, not its knowledge about what its programming is "meant" to be (unless we've successfully programmed in "do what I mean", which is basically the whole of the challenge). We can't therefore conclude that it's incompetent, unable to understand human reasoning, or likely to fail.
We can't reason by analogy with humans. When AIs behave like idiot savants with respect to their motivations, we can't deduce that they're idiots.