How much friendliness is enough?

cousin_it

According to Eliezer, making AI safe requires solving two problems:

1) Formalize a utility function whose fulfillment would constitute "good" to us. CEV is intended as a step toward that.

2) Invent a way to code an AI so that it's mathematically guaranteed not to change its goals after many cycles of self-improvement, negotiations etc. TDT is intended as a step toward that.

It is obvious to me that (2) must be solved, but I'm not sure about (1). The problem in (1) is that we're asked to formalize a whole lot of things that don't look like they should be necessary. If the AI is tasked with building a faster and more efficient airplane, does it really need to understand that humans don't like to be bored?

To put the question sharply, which of the following looks easier to formalize:

a) Please output a proof of the Riemann hypothesis, and please don't get out of your box along the way.

b) Please do whatever the CEV of humanity wants.

Note that I'm not asking if (a) is easy in absolute terms, only if it's easier than (b). If you disagree that (a) looks easier than (b), why?

According to Eliezer, making AI safe requires solving two problems:

1) Formalize a utility function whose fulfillment would constitute "good" to us. CEV is intended as a step toward that.

2) Invent a way to code an AI so that it's mathematically guaranteed not to change its goals after many cycles of self-improvement, negotiations etc. TDT is intended as a step toward that.

To put the question sharply, which of the following looks easier to formalize:

a) Please output a proof of the Riemann hypothesis, and please don't get out of your box along the way.

b) Please do whatever the CEV of humanity wants.

Note that I'm not asking if (a) is easy in absolute terms, only if it's easier than (b). If you disagree that (a) looks easier than (b), why?

making AI friendly requires solving two problems

The goal is not to "make an AI friendly" (non-lethal), it's to make a Friendly AI. That is, not to make some powerful agent that doesn't kill you (and does something useful), but make an agent that can be trusted with autonomously building the future. For example, a merely non-lethal AI won't help with preventing UFAI risks.

So it's possible that some kind of Oracle AI can be built, but so what? And the risk of unknown unknowns remains, so it's probably a bad idea even if it looks provably safe.

If we get a working Oracle AI, couldn't we just ask it how to build an FAI. I just don't think this is of much use since the Oracle route doesn't really seem much easier than the FAI route.

6Lightwave15y

Doesn't this also apply to provably friendly Friendly AI? Perhaps even more so, given that it is a project of higher complexity.

2Alexandros15y

If: (1) There is a way to make an AI that is useful and provably not-unfriendly (2) This requires a subset of the breakthroughs required for a true FAI (3) It can be used to provide extra leverage towards building a FAI (i.e. using it to generate prestige and funds for hiring and training the best brains available. How? Start by solving protein folding or something.) Then this safe & useful AI should certainly be a milestone on the way towards FAI.

10

How much friendliness is enough?

10

10

10

How much friendliness is enough?

10

10