The idiot savant AI isn't an idiot

Stuart_Armstrong

A stub on a point that's come up recently.

If I owned a paperclip factory, and casually told my foreman to improve efficiency while I'm away, and he planned a takeover of the country, aiming to devote its entire economy to paperclip manufacturing (apart from the armament factories he needed to invade neighbouring countries and steal their iron mines)... then I'd conclude that my foreman was an idiot (or being wilfully idiotic). He obviously had no idea what I meant. And if he misunderstood me so egregiously, he's certainly not a threat: he's unlikely to reason his way out of a paper bag, let alone to any position of power.

If I owned a paperclip factory, and casually programmed my superintelligent AI to improve efficiency while I'm away, and it planned a takeover of the country... then I can't conclude that the AI is an idiot. It is following its programming. Unlike a human that behaved the same way, it probably knows exactly what I meant to program in. It just doesn't care: it follows its programming, not its knowledge about what its programming is "meant" to be (unless we've successfully programmed in "do what I mean", which is basically the whole of the challenge). We can't therefore conclude that it's incompetent, unable to understand human reasoning, or likely to fail.

We can't reason by analogy with humans. When AIs behave like idiot savants with respect to their motivations, we can't deduce that they're idiots.

A stub on a point that's come up recently.

We can't reason by analogy with humans. When AIs behave like idiot savants with respect to their motivations, we can't deduce that they're idiots.

That's just another way of saying "do what I mean". And it doesn't give us the code to implement that.

I thought this was quite clear, but maybe not. Let's play taboo with the phrase “do what I mean.”

“Do what I asked within the constraints of my own unstated principles”

“Bring about the end-goal I requested, without in the process taking actions that I would not approve of”

“Develop a predictive model of my psychology, and evaluate solutions to the stated task against that model. When a solution matches the goal but rejected by the model, do not take that action until the conflict is resolved. Resolving the conflict will require either clarification of the task to exclude such possibilities (which can be done automatically if I have a high-confidence theory for why the task was not further specified), or updating the psychological model of my creators to match empirical reality.”

Do you see now how that is implementable?

EDIT: To be clear, for a variety of reasons I don't think it is a good idea to build a “do what I mean” AI, unless “do what I mean” is generalized to the reflective equilibrium of all of humanity. But that's the way the paperclip argument is posed.

Do you see now how that is implementable?

No.

Do you think that a human rule lawyer, someone built to manipulate rules and regulations, could not argue there way through this, sticking with all the technical requirements but getting completely different outcomes? I know I could.

And if a human rule-lawyer could do it, that means that there exists ways of satisfying the formal criteria without doing what we want. Once we know these exist, the question is then: would the AI stumble preferentially on the solution we had in mind? Why would we expect it to do so when we haven't even been able to specify that solution?

10

The idiot savant AI isn't an idiot

10

10

10

The idiot savant AI isn't an idiot

10

10