Manfred comments on Reply to Holden on 'Tool AI' - Less Wrong

94 Post author: Eliezer_Yudkowsky 12 June 2012 06:00PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (348)

You are viewing a single comment's thread. Show more comments above.

Comment author: AlexMennen 13 June 2012 12:07:49AM *  9 points [-]

And if the preference function was just over the human's 'goodness' of the end result, rather than the accuracy of the human's understanding of the predictions, the AI might tell you something that was predictively false but whose implementation would lead you to what the AI defines as a 'good' outcome. And if we ask how happy the human is, the resulting decision procedure would exert optimization pressure to convince the human to take drugs, and so on.

I was under the impression that Holden's suggestion was more along the lines of: Make a model of the world. Remove the user from the model and replace it with a similar user that will always do what you recommend. Then manipulate this user so that it achieves its objective in the model, and report the actions that you have the user do in the model to the real user.

Thus, if the objective was to make the user happy, the Google Maps AGI would simply instruct the user to take drugs, rather than tricking him into doing so, because such instruction is the easiest way to manipulate the user in the model that the Google Maps AGI is optimizing in.

Comment author: Manfred 13 June 2012 01:20:18AM 8 points [-]

Actually, the easiest output for the AI in that case is "be happy."

Comment author: Eliezer_Yudkowsky 13 June 2012 01:28:21AM 2 points [-]

But - that's not what he meant!

Comment author: thomblake 13 June 2012 01:19:52PM 6 points [-]

I don't know why you keep harping on this. Just because an algorithm logically can produce a certain output, and probably will produce that output, doesn't mean good intentions and vigorous handwaving are any less capable of magic.

This is why when I fire a gun, I just point it in the general direction of my target, and assume the universe will know what I meant to hit.

Comment author: MBlume 21 June 2012 04:15:14AM 3 points [-]

I mean, it works in so many video games.

Comment author: Strange7 13 June 2012 08:14:13PM 4 points [-]

As a failure mode, "vague, useless, or trivially-obvious suggestions" is less of a problem than "rapidly eradicates all life." Historically, projects that were explicitly designed to be safe even when they inevitably failed have been more successful and less deadly than projects which were obsessively designed never to fail at all.

Comment author: pnrjulius 19 June 2012 03:57:19AM 0 points [-]

Indeed, one of the first things we teach our engineers is "Even if you're sure it can't fail, plan for failure anyway. Many before you have been sure things couldn't fail---that failed."

Comment author: AlexMennen 13 June 2012 05:26:49AM *  0 points [-]

Indeed it isn't, although I'm not so foolish as to claim to know how to fully specify my suggestion in a way that avoids all of these sorts of problems.