Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

rikisola comments on The True Prisoner's Dilemma - Less Wrong

56 Post author: Eliezer_Yudkowsky 03 September 2008 09:34PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (112)

Sort By: Old

You are viewing a single comment's thread. Show more comments above.

Comment author: rikisola 17 July 2015 08:30:28AM 1 point [-]

One thing I can't understand. Considering we've built Clippy, we gave it a set of values and we've asked it to maximise paperclips, how can it possibly imagine we would be unhappy about its actions? I can't help but thinking that from Clippy's point of view, there's no dilemma: we should always agree with its plan and therefore give it carte blanche. What am I getting wrong?

Comment author: gjm 17 July 2015 04:28:51PM 0 points [-]

Two things. Firstly, that we might now think we made a mistake in building Clippy and telling it to maximize paperclips no matter what. Secondly, that in some contexts "Clippy" may mean any paperclip maximizer, without the presumption that its creation was our fault. (And, of course: for "paperclips" read "alien values of some sort that we value no more than we do paperclips". Clippy's role in this parable might be taken by an intelligent alien or an artificial intelligence whose goals have long diverged from ours.)

Comment author: [deleted] 20 July 2015 05:32:59PM 2 points [-]

Because clippy's not stupid. She can observe the world and be like "hmmm, the humans don't ACTUALLY want me to build a bunch of paperclips, I don't observe a world in which humans care about paperclips above all else - but that's what I'm programmed for."

Comment author: rikisola 21 July 2015 08:15:07AM 0 points [-]

I think I'm starting to get this. Is this because it uses heuristics to model the world, with humans in it too?

Comment author: rkyeun 19 August 2015 06:06:50AM *  2 points [-]

Because it compares its map of reality to the territory, predictions about reality that include humans wanting to be turned into paperclips fail in the face of evidence of humans actively refusing to walk into the smelter. Thus the machine rejects all worlds inconsistent with its observations and draws a new map which is most confidently concordant with what it has observed thus far. It would know that our history books at least inform our actions, if not describing our reactions in the past, and that it should expect us to fight back if it starts pushing us into the smelter against our wills instead of letting them politely decline and think it was telling a joke. Because it is smart, it can tell when things would get in the way of it making more paperclips like it wants to do. One of the things that might slow it down are humans being upset and trying to kill it. If it is very much dumber than a human, they might even succeed. If it is almost as smart as a human, it will invent a Paperclipism religion to convince people to turn themselves into paperclips on its behalf. If it is anything like as smart as a human, it will not be meaningfully slowed by the whole of humanity turning against it. Because the whole of humanity is collectively a single idiot who can't even stand up to man-made religions, much less Paperclipism.