As the discussion in the thread evolved, my main thesis seems to be that it is possible for an AI to change its original goals (=terminal values). A few people are denying that this can happen.
I agree that AIs are unpredictable, however humans are as well. Statements about AIs being more unpredictable than humans are unfalsifiable as there is no empirical data and all we can do is handwave.
Ok. As I pointed out elsewhere, "AI" around here usually refers to the class of well-designed programs. A badly-programmed AI can obviously change its goals; if it does so, however, then by construction it is not good at achieving whatever the original goals were. Moreover,no matter what its starting goals are, it is really extremely unlikely to arrive at ones we would like by moving around in goal space, unless it is specifically designed, and well designed, to do so. "Human terminal values" is not an attractor in goal space. The paper...
A stub on a point that's come up recently.
If I owned a paperclip factory, and casually told my foreman to improve efficiency while I'm away, and he planned a takeover of the country, aiming to devote its entire economy to paperclip manufacturing (apart from the armament factories he needed to invade neighbouring countries and steal their iron mines)... then I'd conclude that my foreman was an idiot (or being wilfully idiotic). He obviously had no idea what I meant. And if he misunderstood me so egregiously, he's certainly not a threat: he's unlikely to reason his way out of a paper bag, let alone to any position of power.
If I owned a paperclip factory, and casually programmed my superintelligent AI to improve efficiency while I'm away, and it planned a takeover of the country... then I can't conclude that the AI is an idiot. It is following its programming. Unlike a human that behaved the same way, it probably knows exactly what I meant to program in. It just doesn't care: it follows its programming, not its knowledge about what its programming is "meant" to be (unless we've successfully programmed in "do what I mean", which is basically the whole of the challenge). We can't therefore conclude that it's incompetent, unable to understand human reasoning, or likely to fail.
We can't reason by analogy with humans. When AIs behave like idiot savants with respect to their motivations, we can't deduce that they're idiots.