You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Peterdjones comments on asking an AI to make itself friendly - Less Wrong Discussion

-4 Post author: anotheruser 27 June 2011 07:06AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (30)

You are viewing a single comment's thread. Show more comments above.

Comment author: Peterdjones 27 June 2011 06:21:42PM 0 points [-]

You're assuming the friendliness problem has been solved. An evil AI could see the question as a perfect opportunity to hand down a solution than could spell our doom.

Comment author: anotheruser 28 June 2011 09:41:14AM -2 points [-]

Why would the AI be evil?

Intentions don't develop on their own. "Evil" intentions could only arise from misinterpreting existing goals.

While you are asking it to come up with a solution, you have its goal set to what I said in the original post:

"the temporary goal to always answer questions thruthfully as far as possible while admitting uncertainty"

Where would the evil intentions come from? At the moment you are asking the question, the only thing on the AI's mind is how it can answer truthfully.

The only loophole I can see is that it might realize it can reduce its own workload by killing everyone who is asking it questions, but that would be countered by the secondary goal "don't influence reality beyond answering questions".

Unless the programmers are unable to give the AI this extremely simple goal to just always speak the truth (as far as it knows), the AI won't have any hidden intentions.

And if the programmers working on the AI really are unable to implement this relatively simple goal, there is no hope that they would ever be able to implement the much more complex "optimal goal" they are trying to find out, anyway.

Comment author: Peterdjones 28 June 2011 05:54:43PM 1 point [-]

Why would the AI be evil?

Bugs, maybe

Intentions don't develop on their own. "Evil" intentions could only arise from misinterpreting existing goals.

While you are asking it to come up with a solution, you have its goal set to what I said in the original post:

Have you? Are you talking about a human level AI. Asking or commanding a human to do something doesn't set that as their one an onyl goal. A human reacts according to their existing goals:they might complyhl, refuse or subvert the command.

"the temporary goal to always answer questions thruthfully as far as possible while admitting uncertainty"

Why would it be easier to code in "be truthful" than "be friendly"?

Comment author: anotheruser 29 June 2011 06:53:34AM *  -2 points [-]

that would have to be a really sophisticated bug to misinterpret "always answer questions thruthfully as far as possible while admitting uncertainty" as "kill all humans". I'd imagine that something as drastic as that could be found and corrected long before that. Consider that you have its goal set to this. It knows no other motivation but to respond thruthfully. It doesn't care about the survival of humanity, or itself or about how reality really is. All it cares for is to answer the questions to the best of its abilities.

I don't think that this goal would be all too hard to define either, as "the truth" is a pretty simple concept. As long it deals with uncertainty in the right way (by admitting it), how could this be misinterpreted? Friendliness is far harder to define because we don't even know a definition for it ourselves. There are far too many things to consider when defining "friendliness".

Comment author: Larks 29 June 2011 03:24:15PM 4 points [-]

Trivial Failure Case: The AI turns the universe into hardware to support really big computations, so it can be really sure it's got the right answer, and also callibrate itself really well on the uncertainty.