asking an AI to make itself friendly

anotheruser

edit: I think I have phrased this really poorly and that this has been misinterpreted. See my comment below for clarification.

A lot of thought has been put into the discussion of how one would need to define the goals of an AI so that it won't find any "loopholes" and act in an unintended way.

Assuming one already had an AI that is capable of understanding human psychology, which seems necessary to me to define the AI's goals anyway, wouldn't it be reasonable to assume that the AI would have an understanding of what humans want?

If that is the case, would the following approach work to make the AI friendly?

-give it the temporary goal to always answer questions thruthfully as far as possible while admitting uncertainty

-also give it the goal to not alter reality in any way besides answering questions.

-ask it what it thinks would be the optimal definition of the goal of a friendly AI, from the point of view of humanity, accounting for things that humans are too stupid to see coming.

-have a discussion between it and a group of ethicists/philosophers wherein both parties are encouraged to point out any flaws in the definition.

-have this go on for a long time until everyone (especially the AI, seeing as it is smarter than anyone else) is certain that there is no flaw in the definition and that it accounts for all kinds of ethical contingencies that might arise after the singularity.

-implement the result as the new goal of the AI.

What do you think of this approach?

edit: I think I have phrased this really poorly and that this has been misinterpreted. See my comment below for clarification.

A lot of thought has been put into the discussion of how one would need to define the goals of an AI so that it won't find any "loopholes" and act in an unintended way.

If that is the case, would the following approach work to make the AI friendly?

-give it the temporary goal to always answer questions thruthfully as far as possible while admitting uncertainty

-also give it the goal to not alter reality in any way besides answering questions.

-ask it what it thinks would be the optimal definition of the goal of a friendly AI, from the point of view of humanity, accounting for things that humans are too stupid to see coming.

-have a discussion between it and a group of ethicists/philosophers wherein both parties are encouraged to point out any flaws in the definition.

-implement the result as the new goal of the AI.

What do you think of this approach?

I have read the sequences (well, most of it). I can't find this as a standard proposal.

I think that I haven't made clear what I wanted to say so you just defaulted to "he has no idea what he is talking about" (which is reasonable).

What I meant to say is that rather than defining the "optimal goal" of the AI based on what we can come up with ourselves, the problem can be delegated to the AI itself as a psychological problem.

I assume that an AI would possess some knowledge of human psychology, as that would be necessary for pretty much every practical application, like talking to it.

What then prevents us from telling the AI the following:

"We humans would like to become immortal and live in utopia (or however you want to phrase it. If the AI is smart it will understand what you really mean through psychology). We disagree on the specifics and are afraid that something may go wrong. There are many contingencies to consider. Here is a list of contingencies we have come up with. Do you understand what we are trying to do? As you are much smarter than us, can you find anything that we have overlooked but that you expect us to agree with you on, once you point it out to us? Different humans have different opinions. This factors into this problem, too. Can you propose a general solution to this problem that remains flexible in the face of an unpredictable future (transhumas may have different ethics)?"

In essence, it all boils down to asking the AI:

"if you were in our position, if you had our human goals and drives, how would you define your (the AI's) goals?"

If you have an agent that is vastly more intelligent than you are and that understands how your human mind works, couldn't you just delegate the task of finding a good goal for it to the AI itself, just like you can give it any other kind of task?

In essence, it all boils down to asking the AI: "if you were in our position, if you had our human goals and drives, how would you define your (the AI's) goals?"

That's extrapolated volition.

And it requires telling the AI "Implement good. Human brains contain evidence for good, but don't define it; don't modify human drives, that won't change good.". It requires telling it "Prove you don't get goal drift when you self-modify.". It requires giving it an explicit goal system for its infancy, telling it that it's allowed to use... (read more)

-2Peterdjones15y

You're assuming the friendliness problem has been solved. An evil AI could see the question as a perfect opportunity to hand down a solution than could spell our doom.

10orthonormal15y

Welcome to Less Wrong! In a sense, the Friendly AI problem is about delegating the definition of Friendliness to a superintelligence. The main issue is that it's easy to underestimate (on account of the Mind Projection Fallacy) how large a kernel of the correct answer it needs to start off with, in order for that delegation to work properly. There's rather a lot that goes into this, and unfortunately it's scattered over many posts that aren't collected in one sequence, but you can find much of it linked from Fake Fake Utility Functions (sic, and not a typo) and Value is Fragile.

-5

asking an AI to make itself friendly

-5

-5

-5

asking an AI to make itself friendly

-5

-5