Gurkenglas comments on The Blue-Minimizing Robot - Less Wrong

162 Post author: Yvain 04 July 2011 10:26PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (159)

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 06 July 2011 04:04:01PM *  28 points [-]

The conclusion I'd draw from this essay is that one can't necessarily derive a "goal" or a "utility function" from all possible behavior patterns. If you ask "What is the robot's goal?", the answer is, "it doesn't have one," because it doesn't assign a total preference ordering to states of the world. At best, you could say that it prefers state [I SEE BLUE AND I SHOOT] to state [I SEE BLUE AND I DON'T SHOOT]. But that's all.

This has some implications for AI, I think. First of all, not every computer program has a goal or a utility function. There is no danger that your TurboTax software will take over the world and destroy all human life, because it doesn't have a general goal to maximize the number of completed tax forms. Even rather sophisticated algorithms can completely lack goals of this kind -- they aren't designed to maximize some variable over all possible states of the universe. It seems that the narrative of unfriendly AI is only a risk if an AI were to have a true goal function, and many useful advances in artificial intelligence (defined in the broad sense) carry no risk of this kind.

Do humans have goals? I don't know; it's plausible that we have goals that are complex and hard to define succinctly, and it's also plausible that we don't have goals at all, just sets of instructions like "SHOOT AT BLUE." The test would seem to be if a human goal of "PROMOTE VALUE X" continues to imply behaviors in strange and unfamiliar circumstances, or if we only have rules of behavior in a few common situations. If you can think clearly about ethics (or preferences) in the far future, or the distant past, or regarding unfamiliar kinds of beings, and your opinions have some consistency, then maybe those ethical beliefs or preferences are goals. But probably many kinds of human behavior are more like sets of instructions than goals.

Comment author: Gurkenglas 03 December 2013 01:30:49AM 3 points [-]

At best, you could say that it prefers state [I SEE BLUE AND I SHOOT] to state [I SEE BLUE AND I DON'T SHOOT]. But that's all.

No; placing a blue-tinted mirror in front of him will have him shoot himself even though that greatly diminishes his future ability to shoot. Generally a generic program really can't be assigned any nontrivial utility function.

Comment author: Yaacov 31 January 2016 01:53:17AM *  0 points [-]

Destroying the robot greatly diminishes its future ability to shoot, but it would also greatly diminishes its future ability to see blue. The robot doesn't prefer 'shooting blue' to 'not shooting blue', it prefers 'seeing blue and shooting' to 'seeing blue and not shooting'.

So the original poster was right.

Edit: I'm wrong, see below

Comment author: Gurkenglas 03 February 2016 01:19:16PM 1 point [-]

If the robot knows that its camera is indestructible but its gun isn't, it would still shoot at the mirror and destroy only its gun.