Manfred comments on Reply to Holden on 'Tool AI' - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (348)
I was under the impression that Holden's suggestion was more along the lines of: Make a model of the world. Remove the user from the model and replace it with a similar user that will always do what you recommend. Then manipulate this user so that it achieves its objective in the model, and report the actions that you have the user do in the model to the real user.
Thus, if the objective was to make the user happy, the Google Maps AGI would simply instruct the user to take drugs, rather than tricking him into doing so, because such instruction is the easiest way to manipulate the user in the model that the Google Maps AGI is optimizing in.
Actually, the easiest output for the AI in that case is "be happy."
But - that's not what he meant!
I don't know why you keep harping on this. Just because an algorithm logically can produce a certain output, and probably will produce that output, doesn't mean good intentions and vigorous handwaving are any less capable of magic.
This is why when I fire a gun, I just point it in the general direction of my target, and assume the universe will know what I meant to hit.
I mean, it works in so many video games.
As a failure mode, "vague, useless, or trivially-obvious suggestions" is less of a problem than "rapidly eradicates all life." Historically, projects that were explicitly designed to be safe even when they inevitably failed have been more successful and less deadly than projects which were obsessively designed never to fail at all.
Indeed, one of the first things we teach our engineers is "Even if you're sure it can't fail, plan for failure anyway. Many before you have been sure things couldn't fail---that failed."
Indeed it isn't, although I'm not so foolish as to claim to know how to fully specify my suggestion in a way that avoids all of these sorts of problems.