The idiot savant AI isn't an idiot

Stuart_Armstrong

A stub on a point that's come up recently.

If I owned a paperclip factory, and casually told my foreman to improve efficiency while I'm away, and he planned a takeover of the country, aiming to devote its entire economy to paperclip manufacturing (apart from the armament factories he needed to invade neighbouring countries and steal their iron mines)... then I'd conclude that my foreman was an idiot (or being wilfully idiotic). He obviously had no idea what I meant. And if he misunderstood me so egregiously, he's certainly not a threat: he's unlikely to reason his way out of a paper bag, let alone to any position of power.

If I owned a paperclip factory, and casually programmed my superintelligent AI to improve efficiency while I'm away, and it planned a takeover of the country... then I can't conclude that the AI is an idiot. It is following its programming. Unlike a human that behaved the same way, it probably knows exactly what I meant to program in. It just doesn't care: it follows its programming, not its knowledge about what its programming is "meant" to be (unless we've successfully programmed in "do what I mean", which is basically the whole of the challenge). We can't therefore conclude that it's incompetent, unable to understand human reasoning, or likely to fail.

We can't reason by analogy with humans. When AIs behave like idiot savants with respect to their motivations, we can't deduce that they're idiots.

A stub on a point that's come up recently.

We can't reason by analogy with humans. When AIs behave like idiot savants with respect to their motivations, we can't deduce that they're idiots.

Do you think that a human rule lawyer, someone built to manipulate rules and regulations, could not argue there way through this, sticking with all the technical requirements but getting completely different outcomes? I know I could.

The question isn't whether there is one solution, but whether the space of possible solutions is encompassed by acceptable morals. I would not “expect an AI to stumble preferentially on the solution we had in mind” because I am confused and do not know what the solution is, as are you and everyone else on LessWrong. However that is a separate issue from whether we can specify what a solution would look like, such as a reflective-equilibrium solution to the coherent extrapolated volition of humankind. You can write an optimizer to search for a description of CEV without actually knowing what the result will be.

It's like saying “I want to calculate pi to the billionth digit” and writing a program to do it, then arguing that we can't be sure the result is correct because we don't know ahead of time what the billionth digit of pi will be. Nonsense.

The question isn't whether there is one solution, but whether the space of possible solutions is encompassed by acceptable morals.

Whether the space of possible solutions is contained in the space of moral outcomes.

10

The idiot savant AI isn't an idiot

10

10

10

The idiot savant AI isn't an idiot

10

10