The idiot savant AI isn't an idiot

Stuart_Armstrong

A stub on a point that's come up recently.

If I owned a paperclip factory, and casually told my foreman to improve efficiency while I'm away, and he planned a takeover of the country, aiming to devote its entire economy to paperclip manufacturing (apart from the armament factories he needed to invade neighbouring countries and steal their iron mines)... then I'd conclude that my foreman was an idiot (or being wilfully idiotic). He obviously had no idea what I meant. And if he misunderstood me so egregiously, he's certainly not a threat: he's unlikely to reason his way out of a paper bag, let alone to any position of power.

If I owned a paperclip factory, and casually programmed my superintelligent AI to improve efficiency while I'm away, and it planned a takeover of the country... then I can't conclude that the AI is an idiot. It is following its programming. Unlike a human that behaved the same way, it probably knows exactly what I meant to program in. It just doesn't care: it follows its programming, not its knowledge about what its programming is "meant" to be (unless we've successfully programmed in "do what I mean", which is basically the whole of the challenge). We can't therefore conclude that it's incompetent, unable to understand human reasoning, or likely to fail.

We can't reason by analogy with humans. When AIs behave like idiot savants with respect to their motivations, we can't deduce that they're idiots.

A stub on a point that's come up recently.

We can't reason by analogy with humans. When AIs behave like idiot savants with respect to their motivations, we can't deduce that they're idiots.

Right. And I'm not trying to argue that we should despair of building a friendly AI, or of identifying friendliness. I'm just noting that the default is for AI behavior to be much harder than human behavior for humans to predict and understand. This is especially true for intelligences constructed through whole-brain emulation, evolutionary algorithms, or other relatively complex and autonomous processes.

It should be possible for us to mitigate the risk, but actually doing so may be one of the most difficult tasks humans have ever attempted, and is certainly one of the most consequential.

Let's make this easy. Do you think the probability of a person saying "hello" to a stranger who just said "hello" to him/her is less than 10%? Do you think you can predict Deep Blue's moves with greater than 10% confidence?

Deep Blue's moves are, minimally, unpredictable enough to allow it to consistently outsmart the smartest and best-trained humans in the world in its domain. The comparison is almost unfair, because unpredictability is selected for in Deep Blue's natural response to chess positions, whereas predictability is strongly selected for in human social conduct. If we can't even come to an agreement on this incredibly simple base case -- if we can't even agree, for instance, that people greet each other with 'hi!' with higher frequency than Deep Blue executes a particular gambit -- then talking about much harder cases will be unproductive.

I really don't know the probability of a person saying hello to a stranger who said hello to them. It depends on too many factors, like the look & vibe of the stranger, the history of the person being said hello to, etc.

Given a time constraint, I'd agree that I'd be more likely to predict that the girl would reply hello than to predict Deep Blue's next move, but if there were not a time constraint, I think Deep Blue's moves would be almost 100% predictable. The reason being that all that Deep Blue does is calculate, it doesn't consult its feelings befo... (read more)