How much friendliness is enough?

cousin_it

According to Eliezer, making AI safe requires solving two problems:

1) Formalize a utility function whose fulfillment would constitute "good" to us. CEV is intended as a step toward that.

2) Invent a way to code an AI so that it's mathematically guaranteed not to change its goals after many cycles of self-improvement, negotiations etc. TDT is intended as a step toward that.

It is obvious to me that (2) must be solved, but I'm not sure about (1). The problem in (1) is that we're asked to formalize a whole lot of things that don't look like they should be necessary. If the AI is tasked with building a faster and more efficient airplane, does it really need to understand that humans don't like to be bored?

To put the question sharply, which of the following looks easier to formalize:

a) Please output a proof of the Riemann hypothesis, and please don't get out of your box along the way.

b) Please do whatever the CEV of humanity wants.

Note that I'm not asking if (a) is easy in absolute terms, only if it's easier than (b). If you disagree that (a) looks easier than (b), why?

According to Eliezer, making AI safe requires solving two problems:

1) Formalize a utility function whose fulfillment would constitute "good" to us. CEV is intended as a step toward that.

2) Invent a way to code an AI so that it's mathematically guaranteed not to change its goals after many cycles of self-improvement, negotiations etc. TDT is intended as a step toward that.

To put the question sharply, which of the following looks easier to formalize:

a) Please output a proof of the Riemann hypothesis, and please don't get out of your box along the way.

b) Please do whatever the CEV of humanity wants.

Note that I'm not asking if (a) is easy in absolute terms, only if it's easier than (b). If you disagree that (a) looks easier than (b), why?

As I understand it, EY worked through a chain of reasoning about a decade ago, in his book "Creating Friendly AI". The chain of reasoning is long and I won't attempt to recap it here, but there are two relevant conclusions.

First, that self-improving artificial intelligences are dangerous, and that projects to build self-improving artificial intelligence, or general intelligence that might in principle become self-modifying (such as Goertzel's), are increasing existential risk. Second, that the primary defense against self-improving artificial intelligences is a Friendly self-improving artificial intelligence, and so, in order to reduce existential risk, EY must work on developing (a restricted subset of) self-improving artificial intelligence.

This seems nigh-paradoxical (and unnecessarily dramatic) to me - you should not do , and yet EY must do . As I said before, this "cancel infinities against one another" sort of thinking (another example might be MAD doctrine), has enormous appeal to a certain (geeky) kind of person. The phenomenon is named "nerd-sniping" in the xkcd comic: http://xkcd.com/356/

Rather than pursuing Friendly AGI vigorously as last/best/only hope for humanity, we should do at least two things:

Look hard for errors in the long chain of reasoning that led to these peculiar conclusions, on the grounds that reality rarely calls for that kind of nigh-paradoxical action, and it's far more likely that either all AI development is generally a good thing for existential risks, or all AI development is a generally bad thing for existential risks - EY shouldn't get any special AI-development license.
Look hard for more choices - for example, building entities that are very capable at defeating rogue Unfriendly AGI takeoffs, and yet which are not themselves a threat to humanity in general, nor prone to hard takeoffs. It may be difficult to imagine such entities, but all the reduce-existential-risk tasks are very difficult.

reality rarely calls for that kind of nigh-paradoxical action

In my experience, reality frequently includes scenarios where the best way to improve my ability to defend myself involves also improving my ability to harm others, should I decide to do that. So it doesn't seem that implausible to me.

Indeed, militaries are pretty much built on this principle, and are fairly common.

But, sure... there are certainly alternatives.

2cousin_it15y

I have similar misgivings, they prompted me to write the post. Fighting fire with fire looks like a dangerous idea. The problem statement should look like "how do we stop unfriendly AIs", not "how do we make friendly AIs". Many people here (e.g. Nesov and SarahC) seem convinced that the latter is the most efficient way of achieving the former. I hope we can find a better way if we think some more.

10

How much friendliness is enough?

10

10

10

How much friendliness is enough?

10

10