How much friendliness is enough?

cousin_it

According to Eliezer, making AI safe requires solving two problems:

1) Formalize a utility function whose fulfillment would constitute "good" to us. CEV is intended as a step toward that.

2) Invent a way to code an AI so that it's mathematically guaranteed not to change its goals after many cycles of self-improvement, negotiations etc. TDT is intended as a step toward that.

It is obvious to me that (2) must be solved, but I'm not sure about (1). The problem in (1) is that we're asked to formalize a whole lot of things that don't look like they should be necessary. If the AI is tasked with building a faster and more efficient airplane, does it really need to understand that humans don't like to be bored?

To put the question sharply, which of the following looks easier to formalize:

a) Please output a proof of the Riemann hypothesis, and please don't get out of your box along the way.

b) Please do whatever the CEV of humanity wants.

Note that I'm not asking if (a) is easy in absolute terms, only if it's easier than (b). If you disagree that (a) looks easier than (b), why?

According to Eliezer, making AI safe requires solving two problems:

1) Formalize a utility function whose fulfillment would constitute "good" to us. CEV is intended as a step toward that.

2) Invent a way to code an AI so that it's mathematically guaranteed not to change its goals after many cycles of self-improvement, negotiations etc. TDT is intended as a step toward that.

To put the question sharply, which of the following looks easier to formalize:

a) Please output a proof of the Riemann hypothesis, and please don't get out of your box along the way.

b) Please do whatever the CEV of humanity wants.

Note that I'm not asking if (a) is easy in absolute terms, only if it's easier than (b). If you disagree that (a) looks easier than (b), why?

The primary task that EY and SIAI have in mind for Friendly AI is "take over the world". (By the way, I think this is utterly foolish, exactly the sort of appealing paradox (like "warring for peace") that can nerd-snipe the best of us.)

To some extent technolology itself (lithography, for example) is actually Safe technology, (or BelievedSafe technology). As part of the development of the technology, we also develop the safety procedures around it. The questions and problems about "how should you correctly draw up a contract with the devil" come from:

Explicitly pursuing recursive self-improvement, that is, self-modifying code where every potentially limiting component is on the table to be redesigned.
Using a theological-reasoning strategy regarding the fixpoint of the self-modifications.

If you do not pursue no-holds-barred recursive self-improvement so vigorously, then your task of developing a Riemann-Hypothesis-machine doesn't have to involve theological reasoning at all. Indeed, I'm sure there are many mathematicians and computer scientists who have worked on RH machines, and they have not had problems with their creations running amok.

The primary task that EY and SIAI have in mind for Friendly AI is "take over the world". (By the way, I think this is utterly foolish, exactly the sort of appealing paradox (like "warring for peace") that can nerd-snipe the best of us.)

Could you explain this in more detail?

10

How much friendliness is enough?

10

10

10

How much friendliness is enough?

10

10