asking an AI to make itself friendly

anotheruser

edit: I think I have phrased this really poorly and that this has been misinterpreted. See my comment below for clarification.

A lot of thought has been put into the discussion of how one would need to define the goals of an AI so that it won't find any "loopholes" and act in an unintended way.

Assuming one already had an AI that is capable of understanding human psychology, which seems necessary to me to define the AI's goals anyway, wouldn't it be reasonable to assume that the AI would have an understanding of what humans want?

If that is the case, would the following approach work to make the AI friendly?

-give it the temporary goal to always answer questions thruthfully as far as possible while admitting uncertainty

-also give it the goal to not alter reality in any way besides answering questions.

-ask it what it thinks would be the optimal definition of the goal of a friendly AI, from the point of view of humanity, accounting for things that humans are too stupid to see coming.

-have a discussion between it and a group of ethicists/philosophers wherein both parties are encouraged to point out any flaws in the definition.

-have this go on for a long time until everyone (especially the AI, seeing as it is smarter than anyone else) is certain that there is no flaw in the definition and that it accounts for all kinds of ethical contingencies that might arise after the singularity.

-implement the result as the new goal of the AI.

What do you think of this approach?

edit: I think I have phrased this really poorly and that this has been misinterpreted. See my comment below for clarification.

A lot of thought has been put into the discussion of how one would need to define the goals of an AI so that it won't find any "loopholes" and act in an unintended way.

If that is the case, would the following approach work to make the AI friendly?

-give it the temporary goal to always answer questions thruthfully as far as possible while admitting uncertainty

-also give it the goal to not alter reality in any way besides answering questions.

-ask it what it thinks would be the optimal definition of the goal of a friendly AI, from the point of view of humanity, accounting for things that humans are too stupid to see coming.

-have a discussion between it and a group of ethicists/philosophers wherein both parties are encouraged to point out any flaws in the definition.

-implement the result as the new goal of the AI.

What do you think of this approach?

Why would the AI be evil?

Intentions don't develop on their own. "Evil" intentions could only arise from misinterpreting existing goals.

While you are asking it to come up with a solution, you have its goal set to what I said in the original post:

"the temporary goal to always answer questions thruthfully as far as possible while admitting uncertainty"

Where would the evil intentions come from? At the moment you are asking the question, the only thing on the AI's mind is how it can answer truthfully.

The only loophole I can see is that it might realize it can reduce its own workload by killing everyone who is asking it questions, but that would be countered by the secondary goal "don't influence reality beyond answering questions".

Unless the programmers are unable to give the AI this extremely simple goal to just always speak the truth (as far as it knows), the AI won't have any hidden intentions.

And if the programmers working on the AI really are unable to implement this relatively simple goal, there is no hope that they would ever be able to implement the much more complex "optimal goal" they are trying to find out, anyway.

Why would the AI be evil?

Bugs, maybe

Intentions don't develop on their own. "Evil" intentions could only arise from misinterpreting existing goals.

While you are asking it to come up with a solution, you have its goal set to what I said in the original post:

Have you? Are you talking about a human level AI. Asking or commanding a human to do something doesn't set that as their one an onyl goal. A human reacts according to their existing goals:they might complyhl, refuse or subvert the command.

"the temporary goal to always answer questi

... (read more)

-5

asking an AI to make itself friendly

-5

-5

-5

asking an AI to make itself friendly

-5

-5