anotheruser comments on asking an AI to make itself friendly - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (30)
Why would the AI be evil?
Intentions don't develop on their own. "Evil" intentions could only arise from misinterpreting existing goals.
While you are asking it to come up with a solution, you have its goal set to what I said in the original post:
"the temporary goal to always answer questions thruthfully as far as possible while admitting uncertainty"
Where would the evil intentions come from? At the moment you are asking the question, the only thing on the AI's mind is how it can answer truthfully.
The only loophole I can see is that it might realize it can reduce its own workload by killing everyone who is asking it questions, but that would be countered by the secondary goal "don't influence reality beyond answering questions".
Unless the programmers are unable to give the AI this extremely simple goal to just always speak the truth (as far as it knows), the AI won't have any hidden intentions.
And if the programmers working on the AI really are unable to implement this relatively simple goal, there is no hope that they would ever be able to implement the much more complex "optimal goal" they are trying to find out, anyway.
Bugs, maybe
Intentions don't develop on their own. "Evil" intentions could only arise from misinterpreting existing goals.
Have you? Are you talking about a human level AI. Asking or commanding a human to do something doesn't set that as their one an onyl goal. A human reacts according to their existing goals:they might complyhl, refuse or subvert the command.
Why would it be easier to code in "be truthful" than "be friendly"?
that would have to be a really sophisticated bug to misinterpret "always answer questions thruthfully as far as possible while admitting uncertainty" as "kill all humans". I'd imagine that something as drastic as that could be found and corrected long before that. Consider that you have its goal set to this. It knows no other motivation but to respond thruthfully. It doesn't care about the survival of humanity, or itself or about how reality really is. All it cares for is to answer the questions to the best of its abilities.
I don't think that this goal would be all too hard to define either, as "the truth" is a pretty simple concept. As long it deals with uncertainty in the right way (by admitting it), how could this be misinterpreted? Friendliness is far harder to define because we don't even know a definition for it ourselves. There are far too many things to consider when defining "friendliness".
Trivial Failure Case: The AI turns the universe into hardware to support really big computations, so it can be really sure it's got the right answer, and also callibrate itself really well on the uncertainty.