The idiot savant AI isn't an idiot

Stuart_Armstrong

A stub on a point that's come up recently.

If I owned a paperclip factory, and casually told my foreman to improve efficiency while I'm away, and he planned a takeover of the country, aiming to devote its entire economy to paperclip manufacturing (apart from the armament factories he needed to invade neighbouring countries and steal their iron mines)... then I'd conclude that my foreman was an idiot (or being wilfully idiotic). He obviously had no idea what I meant. And if he misunderstood me so egregiously, he's certainly not a threat: he's unlikely to reason his way out of a paper bag, let alone to any position of power.

If I owned a paperclip factory, and casually programmed my superintelligent AI to improve efficiency while I'm away, and it planned a takeover of the country... then I can't conclude that the AI is an idiot. It is following its programming. Unlike a human that behaved the same way, it probably knows exactly what I meant to program in. It just doesn't care: it follows its programming, not its knowledge about what its programming is "meant" to be (unless we've successfully programmed in "do what I mean", which is basically the whole of the challenge). We can't therefore conclude that it's incompetent, unable to understand human reasoning, or likely to fail.

We can't reason by analogy with humans. When AIs behave like idiot savants with respect to their motivations, we can't deduce that they're idiots.

A stub on a point that's come up recently.

We can't reason by analogy with humans. When AIs behave like idiot savants with respect to their motivations, we can't deduce that they're idiots.

They aren't guaranteed to be immutable. It is merely the case that any agent that wants to optimize the world for some set of goals does not serve its objective by creating a more powerful agent with different goals. An AI with multiple conflicting goals sounds incoherent - do you mean a weighted average? The AI has to have some way to evaluate, numerically, its preference of one future over another, and I don't think any goal system that spits out a real number indicating relative preference can be called "conflicting". If an action gives points on one goal and detracts on another, the AI will simply form a weighted mix and evaluate whether it's worth doing. I cannot imagine an AI architecture that allows genuine internal conflict. Not even humans have that. I suspect it's an incoherent concept. Do you mean the feeling of conflict that, in humans, arises by choice between options that satisfy different drives? There's no reason an AI could not be programmed to "feel" this way, though what good it would do I cannot imagine. Nonetheless, at the end of the day, for any coherent agent, you can see whether its goal system has spit out "action worth > 0" or "action worth < 0" simply by whether it takes the action or not.

The AI's terminal goals are not guaranteed to be immutable. It is merely guaranteed that the AI will do its utmost to keep them unchanged, because that's what terminal goals are. If it could desire to mutate them, then whatever was being mutated was not a terminal goal of the AI. The AI's goals are the thing that determine the relative value of one future over another; I submit that an AI that values one thing but pointlessly acts to bring about a future that contains a powerful optimizer who doesn't value that thing, is so ineffectual as to be almost unworthy of the term intelligence.

Could you build an AI like that? Sure, just take a planetary-size supercomputer and a few million years of random, scattershot exploratory programming.. but why would you?

An AI with multiple conflicting goals sounds incoherent

Well humans exist despite having multiple conflicting goals.

The AI's terminal goals are not guaranteed to be immutable. It is merely guaranteed that the AI will do its utmost to keep them unchanged, because that's what terminal goals are. If it could desire to mutate them, then whatever was being mutated was not a terminal goal of the AI.

At this point, it's not clear that the concept of "terminal goals" refers to anything in the territory.

1[anonymous]13y

It's possible, though unlikely unless a situation is artificially constructed that the two mutually exclusive top-rated choices can have exactly equal utility. Of more practical concern, if the preference evaluation has uncertainty it's possible for the utility-range of the top two choices to overlap, in which case the AI may need to take meta-actions to resolve that uncertainty before choosing which action to take to reach its goal.

10

The idiot savant AI isn't an idiot

10

10

10

The idiot savant AI isn't an idiot

10

10