Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

evand comments on How likely the AI that knows it's evil? Or: is a human-level understanding of human wants enough? - Less Wrong

2 Post author: ChrisHallquist 21 May 2012 05:19AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (29)

You are viewing a single comment's thread. Show more comments above.

Comment author: evand 21 May 2012 01:14:41PM 0 points [-]

following the spirit, not the letter, of our commands

This seems like a trivial variation of "I wish for you to do what I should wish for". Which is to say, I do see it framed exactly that way fairly frequently here. The general problem, I think, is that all of these various problems are at a similar level of difficulty, and the solution to one seems to imply the solution to all of them. The corollary being that something that's nearly a solution to any of them carries all the risks of any AI. This is where terms like "AI-complete" and "FAI-complete" come from.

Comment author: ChrisHallquist 21 May 2012 02:53:25PM *  0 points [-]

On further reflection, this business of "FAI-complete" is very puzzling. What we should make of it depends on what we mean by FAI:

  • If we define FAI broadly, then yes, the problem of getting AI to have a decent understanding of our intentions does seem to be FAI-complete
  • If we defined FAI as a utopia-machine, claims of FAI completeness look very dubious. I have a human's values, but my understanding of my own values isn't perfect. If I found myself in the position of the titular character in Bruce Almighty, I'd trust myself to try to make some very large improvements in the world, but I wouldn't trust myself to try to create a utopia in one fell swoop. If my self-assessment is right, that means it's possible to have a mind that can be trusted to attempt some good actions but not others, which looks like a problem for claims of FAI completeness.

Edit: Though in Bruce Almighty, he just wills things to happen and they happen. There are often unintended consequences, but never any need to worry about what means the genie will use to get the desired result. So it's not a perfect analogy for trying to use super-AI.

Comment author: DanArmak 21 May 2012 06:57:12PM 2 points [-]

Besides, even if an AI is Friendliness-complete and knows the "right thing" to be achieved, it doesn't mean it can actually achieve it. Being superhumanly smart doesn't mean being superhumanly powerful. We often make such an assumption because it's the safe one in the Least Convenient World if the AI is not Friendly. But in the Least Convenient World, a proven-Friendly AI is at least as intelligent as a human, but no more powerful than an average big corp.

Comment author: ChrisHallquist 21 May 2012 02:10:09PM 0 points [-]

From the link you provide:

To be a safe fulfiller of a wish, a genie must share the same values that led you to make the wish.

This may or may not be true depending on what you mean by "safe."

Imagine a superintelligence that executes the intent of any command given with the right authorization code, and is very good at working out the intent of commands. Such a superintelligence might do horrible things to humanity if Alpha Centaurians or selfish/nepotistic humans got ahold of the code, but could have very good effects if a truly altruistic human (if there ever was such a thing) were commanding it. Okay, so that's not a great bet for humanity as a whole, but it's still going to be a safe fulfiller of wishes for whoever makes the wish. Yet it doesn't have anyone's values, it just does what it's told.

I'm glad you linked to that, because I just now noticed that sentence, and it confirms something I've been suspecting about Eliezer's views on AI safety. He seems to think on the one hand you have the AI's abilities, and on the other hand you have it's values. Safe AI depends entirely on the values; you can build an AI that matches human intellectual abilities in every way without making a bit of progress on making it safe.

This is wrong because, by hypothesis, an AI that matches human intellectual abilities in every way would have considerable ability to understand the intent behind orders (at least when those orders are given y humans). IDK if that would be enough, though, when the AI is playing with superpowers. Also, there's no law that says only AIs that are capable of understanding us are allowed to kill us.

Comment author: TimS 21 May 2012 03:55:03PM *  1 point [-]

No eating in the classroom. Is the rule's purpose, the text, or the rule-maker's intent most important?

In short, there are a lot of different incentives acting on agents, and miscalibrating the relative strength of different constraints leads fairly quickly to unintended pernicious outcomes.