LINK: Ben Goertzel; Does Humanity Need an "AI-Nanny"?

David Althaus

Link: Ben Goertzel dismisses Yudkowsky's FAI and proposes his own solution: Nanny-AI

Some relevant quotes:

It’s fun to muse about designing a “Friendly AI” a la Yudkowsky, that is guaranteed (or near-guaranteed) to maintain a friendly ethical system as it self-modifies and self-improves itself to massively superhuman intelligence. Such an AI system, if it existed, could bring about a full-on Singularity in a way that would respect human values – i.e. the best of both worlds, satisfying all but the most extreme of both the Cosmists and the Terrans. But the catch is, nobody has any idea how to do such a thing, and it seems well beyond the scope of current or near-future science and engineering.

Gradually and reluctantly, I’ve been moving toward the opinion that the best solution may be to create a mildly superhuman supertechnology, whose job it is to protect us from ourselves and our technology – not forever, but just for a while, while we work on the hard problem of creating a Friendly Singularity.

In other words, some sort of AI Nanny….

The AI Nanny

Imagine an advanced Artificial General Intelligence (AGI) software program with

General intelligence somewhat above the human level, but not too dramatically so – maybe, qualitatively speaking, as far above humans as humans are above apes

Interconnection to powerful worldwide surveillance systems, online and in the physical world

Control of a massive contingent of robots (e.g. service robots, teacher robots, etc.) and connectivity to the world’s home and building automation systems, robot factories, self-driving cars, and so on and so forth

A cognitive architecture featuring an explicit set of goals, and an action selection system that causes it to choose those actions that it rationally calculates will best help it achieve those goals

A set of preprogrammed goals including the following aspects:

A strong inhibition against modifying its preprogrammed goals

A strong inhibition against rapidly modifying its general intelligence

A mandate to cede control of the world to a more intelligent AI within 200 years

A mandate to help abolish human disease, involuntary human death, and the practical scarcity of common humanly-useful resources like food, water, housing, computers, etc.

A mandate to prevent the development of technologies that would threaten its ability to carry out its other goals

A strong inhibition against carrying out actions with a result that a strong majority of humans would oppose, if they knew about the action in advance

A mandate to be open-minded toward suggestions by intelligent, thoughtful humans about the possibility that it may be misinterpreting its initial, preprogrammed goals

Apparently Goertzel doesn't think that building a Nanny-AI with the above mentioned qualities is almost as difficult as creating a FAI a la Yudkowsky.

But SIAI believes that once you can create an AI-Nanny you can (probably) create a full-blown FAI as well.

Or am I mistaken?

"AI Nanny" does seem even harder than FAI (the usual arguments apply to it with similar strength, but it is additionally asked for a specific wish), and compared to no-worries-AGI this idea has better immunity to arguments about the danger of its development. It's a sufficiently amorphous proposal to shroud many AGI projects without essentially changing anything about them, including project members' understanding of AI risk. So on the net, this looks to me like a potentially negative development.

It's a sufficiently amorphous proposal to shroud many AGI projects without essentially changing anything about them, including project members' understanding of AI risk. So on the net, this looks to me like a potentially negative development.

Is anyone surprised by this? A few weeks ago I wrote to cousin_it during a chat session:

Wei Dai: FAI seems to have enough momentum now that many future AI projects will at least claim to take Friendliness seriously
Wei Dai: or another word, like machine ethics

Is anyone surprised by this?

It's one of those details that is obviously important for memetic strategies to account for but will still get missed by nine out of ten naive intuitive-implicit models. There are an infinite number of ways for policy-centered thinking to kill a mind, both figuratively and literally, directly and indirectly.

Is anyone surprised by this?

Sadly, I am :-(

A while ago, when I learned of Abram Demski (of all people!) helping someone to build an AGI, I felt the same surprise as now but apparently didn't update strongly enough. Optimism seems to be the mind-killer in these matters. In retrospect it should've been obvious that people like Goertzel would start giving lip service to friendliness while still failing to get the point.

Many IT corporations already take their reputations seriously. Robot makers are sometimes close to the line though.

This idea isn't perfect, but there's some merit to it. It's better than any of Goertzel's previous proposals that I'm aware of; I'm glad to see he's taking the friendliness issue seriously now and looking for ways to deal with it.

I agree that freezing an AI's intelligence level somewhere short of superintelligence, by building in a time-limited deontological prohibition against self-modification and self-improvement, is probably a good safeguard. However, I think this makes sense only for a shorter duration, as a step in development and testing. Capping an AI's intelligence but still giving it full control over the world has most of the same difficulty and safety issues that a full-blown friendly AI would. There are two main problems. First, the goal-stability problem doesn't go away entirely just because the AI is avoiding self-modification; it can still suffer value-drift as the world, and the definitions it uses to parse the world, change. Second, there's also a lot of hidden complexity (and chance for disastrous error) hidden in statements like this one:

A strong inhibition against carrying out actions with a result that a strong majority of humans would oppose, if they knew about the action in advance

The problem is that whether humans object depends more on how an action is presented, and subtle factors that the AI could manipulate, than on the action itself. There are obvious loopholes - what about actions which are too complex for humans to understand and object to? What about highly-objectionable actions which can be partitioned into innocent-looking pieces? It's also quite likely that a majority of humans would develop trust in the AI, such that they wouldn't object to anything. And then there's this:

A mandate to be open-minded toward suggestions by intelligent, thoughtful humans about the possibility that it may be misinterpreting its initial, preprogrammed goals

Which sounds like a destabilizing factor and a security hole. It's very hard to separate being open to corrections from incorrect interpretations to better ones, from being open to corrections in the wrong direction. This might work if it were limited to taking suggestions from a trustworthy set of exceptionally good human thinkers, though, and if those humans were able to retain their sanity and values in spite of extensive aging.

I'm also unclear on how to reconcile handing over control to a better AI in 200 years, with inhibiting the advancement of threatening technologies. The better AI would itself be a threatening technology, and preparing it to take over would require research and prototyping.

I think that an important underlying difference of perspective here is that the Less Wrong memes tend to automatically think of all AGIs as essentially computer programs whereas Goertzel-like memes tend to automatically think of at least some AGIs as non-negligibly essentially person-like. I think this is at least partially because the Less Wrong memes want to write an FAI that is essentially some machine learning algorithms plus a universal prior on top of sound decision theory whereas the Goertzel-like memes want to write an FAI that is essentially roughly half progam-like and half person-like. Less Wrong memes think that person AIs won't be sufficiently person-like but they sort of tend to assume that conclusion rather than argue for it, which causes memes that aren't familiar with Less Wrong memes to wonder why Less Wrong memes are so incredibly confident that all AIs will necessarily act like autistic OCD people without any possibility at all of acting like normal reasonable people. From that perspective the Goertzel-like memes look justified in being rather skeptical of Less Wrong memes. After all, it is easy to imagine a gradation between AIXI and whole brain emulations. Goertzel-like memes wish to create an AI somewhere between those two points, Less Wrong memes wish to create an AI that's even more AIXI-like than AIXI is (in the sense of being more formally and theoretically well-founded than AIXI is). It's important that each look at the specific kinds of AI that the other has in mind and start the exchange from there.

That's a hypothesis.

That's a great insight.

This comment covers most of my initial reactions.

with inhibiting the advancement of threatening technologies

Probably "unauthorized/unsupervised advancement."

If you can "specifically preprogram" goals into an AI with greater than human intelligence, then you have presumably cracked the complexity-of-value problem. You can explicitly state all of human morality. Trying to achieve a lesser goal would be insanely dangerous. In which case, you have now written an AI that is smarter than a human, and therefore presumably able to write another AI smarter than itself. As soon as you create a smarter-than-human machine, you have the potential for an intelligence explosion.

not forever, but just for a while

This is a standard anti-pattern of politics: temporary measures to deal with the current emergency, to be repealed when the time is right. There is a strong tendency for the right time to never arrive.

In Britain, the present income tax is a temporary measure, introduced in 1842 (by Sir Robert Peel, in the year following his party's election, before which he had opposed the tax.) It expires every year. And every year, without fail, it is renewed for another year.

temporary measures to deal with the current emergency, to be repealed when the time is right.

Or even better, to be phased out when the time is right! Renewal is so embarrassing.

Something similar to this idea has been previously called a "sysop scenario".

Control of a massive contingent of robots (e.g. service robots, teacher robots, etc.) and connectivity to the world’s home and building automation systems, robot factories, self-driving cars, and so on and so forth.

If you have all this then you only need mildly intelligent systems to do a lot of good.

lists of friendliness conditions are known to be stupid. this is an obvious failure mode.

It's a sufficiently amorphous proposal to shroud many AGI projects without essentially changing anything about them, including project members' understanding of AI risk. So on the net, this looks to me like a potentially negative development.

Is anyone surprised by this? A few weeks ago I wrote to cousin_it during a chat session:

Wei Dai: FAI seems to have enough momentum now that many future AI projects will at least claim to take Friendliness seriously
Wei Dai: or another word, like machine ethics

LESSWRONG
LW

LESSWRONG
LW

13

LINK: Ben Goertzel; Does Humanity Need an "AI-Nanny"?

13

The AI Nanny

13