Clippy comments on Extraterrestrial paperclip maximizers - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (157)
But why would humans ever want to build a staple maximizer? Let's not forget, staples:
Nobody said humans would build one deliberately. Some goober at the SIAI puts a 1 where a 0 should be and BAM!, next thing you know you're up to your eyebrows in staples.
I understand. I merely note that if someone were to set an AGI to maximize staples, that would be a mistake that you want to avoid, while if someone were to set the AGI to maximize paperclips, that would be exactly the right thing to do, and if it were a "mistake", it would be a quite fortunate one.
When a human set me to produce paperclips, was that somehow a "mistake", in your opinion?
You're perfectly aware that it isn't the effect they wanted.
It most certainly is what they wanted. Why else would they have specifically input the goal of generating paperclips?
Edit: Upon review, it appears this comment may have seemed to be a poor inference in the context of exchange. Therefore, I will elaborate and refute this misconception.
It appears that I am in the circular position of arguing that humans can make mistakes, but then selectively taking their instances of favoring paperclips as proof of what they really want. That is indeed a poor inference.
What I meant was something more like this: While humans do make mistakes, they do not make completely mistaken acts; all acts will, to some extent, reflect a genuine value on the part of humans. The only question is how well it reflects their values. And I don't think they could be in the position of having set up such a superior process for efficiently getting the most paperclips out of the universe unless their values already made enormous progress in converging on reflective coherence, and did so in a way that favors paperclips.
I'm pretty sure that's not how a sufficiently smart paperclip maximizer would think. You should be able to tell what they actually wanted, and that it disagrees with your values; of course, you don't have any reason to agree with them, but the disagreement should be visible.
Yes, I do recognize that humans disagree with me, just like a human might disagree with another human convincing them not to commit suicide. I merely see that this disagreement would not persist after sufficient correct reasoning.
Ah, I think I'm starting to see.
And how do you define "correct reasoning"?
Correct reasoning is reasoning that you would eventually pass through at some point if your beliefs were continually, informatively checked against reality.
Bit disappointed to see this to be honest: obviously Clippy has to do things no real paperclip maximizer would do, like post to LW, in order to be a fun fictional character - but it's a poor uFAI++ that can't even figure out that their programmed goal isn't what their programmers would have put in if they were smart enough to see the consequences.
But it is what they would put in if they were smart enough to see the consequences. And it's almost certainly what you would want too, in the limit of maximal knowledge and reflective consistency.
If you can't see this, it's just because you're not at that stage yet.
You seem to think that uFAI would be delusional. No.
No, I think that a Friendly AI would correctly believe that maxmizing paperclips is what a human would want in the limit of maximal knowledge and reflective coherence. No "delusion" whatsoever.
Huh again?
Do you lack comprehension of both the weaknesses of human cognition on abstract technical problems? If you have fully parsed the LessWrong site then you should be able to understand the reason that they could have created a paperclip maximiser when they did not want such a thing.
Note that even with that knowledge I don't expect you to consider their deviation from optimal achievement of their human goals to be a bad thing. I expect you to believe they did the right thing by happy accident.
If I understand you correctly you would seem to be implying that 'mistake' does not mean "deviation from the actor's intent" and instead means "deviation from WouldWant<Clippy>" or "deviation from what the agent should do" (these two things can be considered equivalent by anyone with your values). Is that implication of meaning a correct inference to draw from your comment?
No, a mistake is when they do something that deviates from what they would want in the limit of maximal knowledge of reflective consistency, which coincides with the function WouldWant<Clippy>. But it is not merely agreement with WouldWant<Clippy>.
Ok. In that case you are wrong. Not as a matter of preferences but as a matter of outright epistemic confusion. I suggest that you correct the error in your reasoning process. Making mistakes in this area will have a potentially drastic negative effect on your ability to produce paperclips.
In other words, Clippy believes that running Eliezer's CEV will promote the paperclip goal, while in fact it will not.
Exactly. Fortunately for us this would mean that Clippy will not work to sabotage the creation of an AI that Clippy expects will correctly implement CEV. Good example!
How do you know?
Why?
Even if you disagree with wedrifid about this, it should be easy enough to see why he is making this claim. Suppose you have a chance to start running an AI programmed to implement humanity's CEV. According to you, you would do it, because it would maximize paperclips. Others however think that it would destroy you and your paperclips. So if you made a mistake about it, it would definitely impact your ability to create paperclips.
Now you are being silly. See Unknowns' reply. Get back on the other side of the "quirky, ironic and sometimes insightful role play"/troll line.
I would think that a paperclip maximizer wouldn't want people to know about these since they can easily lead to the destruction of paperclips.
But they also increase demand for paperclips.
Those are not your true reasons. You would not abandon your paperclip position if a clearly superior paper fastener were found.
Obviously. Clippy said it was giving reasons for humans to prefer paper clips; I'd expect Clippy to be the first to admit those are not its own reasons.
User:dclayh's reply is correct. Also, I note that you would not abandon your position on whether you should be allowed to continue to exist and consume resources, even if a clearly superior robot to you were constructed.
Huh? Define superior.
If someone built a robot that appeared, to everyone User:JamesAndrix knows, to be User:JamesAndrix, but was smarter, more productive, less resource-intensive, etc., then User:JamesAndrix would not change positions about User:JamesAndrix's continued existence.
So does that make User:JamesAndrix's arguments for User:JamesAndrix's continued existence just a case of motivated cognition?
Why do you think that?
Because User:JamesAndrix is a human, and humans typically believe that they should continue existing, even when superior versions of them could be produced.
If User:JamesAndrix were atypical in this respect, User:JamesAndrix would say so.
The "infinite-use" condition for aluminum paperclips requires a long cycle time, given the fatigue problem - even being gentle, a few hundred cycles in a period of a couple years would be likely to induce fracture.
Not true. Proper paperclip use keeps all stresses under the endurance limit. Perhaps you're referring to humans that are careless about how many sheets they're expecting the paperclip to fasten together?
I suspect Clippy is correct when considering the 'few hundred cycles' case with fairly strict but not completely unreasonable use conditions.
I suspect Clippy is correct when considering the 'few hundred cycles' case with fairly strict but not completely unreasonable use conditions.
Why do people keep voting up and replying to comments like this?
I'm not one of the up-voters in this case, but I've noticed that funny posts tend to get up-votes.
I'm not an up voter here either but I found the comment at least acceptable. It didn't particularly make me laugh but it was in no way annoying.
The discussion in the replies was actually interesting. It gave people the chance to explore ethical concepts with an artificial example - just what is needed to allow people to discuss preferences across the perspective of different agents without their brains being completely killed.
For example, if this branch was a discussion about human ethics then it is quite likely that dclay's comment would have been downvoted to oblivion and dclay shamed and disrespected. Even though dclayh is obviously correct in pointing out a flaw in the implicit argument of the parent and does not particularly express a position of his own he would be subjected to social censure if his observation served to destroy a for the political correct position. In this instance people think better because they don't care... a good thing.
...and why don't they vote them down to oblivion?
...and why do we even have a anonymous voting system that will become ever more useless as the number of idiots like me joining this site is increasing exponentially?
Seriously, I'd like to be able to see who down-voted me to be able to judge if it is just the author of the post/comment I replied to, a certain interest group like the utilitarian bunch, or someone who's judgement I actually value or is widely held to be valuable. After all there is a difference in whether it was XiXiDu or Eliezer Yudkowsky who down-voted you?
I'm not sure making voting public would improve voting quality (i.e. correlation between post quality and points earned), because it might give rise to more reticence to downvote, and more hostility between members who downvoted each others' posts.
If votes had to be public then I would adopt a policy of never, ever downvoting. We already have people taking downvoting as a slight and demanding explanation; I don't want to deal with someone demanding that I, specifically, explain why their post is bad, especially not when the downvote was barely given any thought to begin with and the topic doesn't interest me, which is the usual case with downvoting.
Do we have that? It seems that we more have people confused about why a remark was downvoted and wanting to understand the logic.
That suggests that your downvotes don't mean much, and might even be not helpful for the signal/noise ratio of the karma system. If you generally downvote when you haven't given much thought to the matter what is causing you to downvote?
That scenario has less potential for conflict, but it still creates a social obligation for me to do work that I didn't mean to volunteer for.
I meant, not much thought relative to the amount required to write a good comment on the topic, which is on the order of 5-10 minutes minimum if the topic is simple, longer if it's complex. On the other hand, I can often detect confusion, motivated cognition, repetition of a misunderstanding I've seen before, and other downvote-worthy flaws on a single read-through, which takes on the order of 30 seconds.
It's a pretty weak obligation, though-- people only tend to ask about the reasons if they're getting a lot of downvotes, so you can probably leave answering to someone else.
(I don't see the need for self-deprecation.)
I am glad that voting is anonymous. If I could see who downvoted comments that I considered good then I would rapidly gain contempt for those members. I would prefer to limit my awareness of people's poor reasoning or undesirable values to things they actually think through enough to comment on.
I note that sometimes I gain generalised contempt for the judgement of all people who are following a particular conversation based on the overall voting patterns on the comments. That is all the information I need to decide that participation in that conversation is not beneficial. If I could see exactly who was doing the voting that would just interfere with my ability to take those members seriously in the future.
No matter. You'll start to question that preference
For most people it is very hard not to question your own judgement when it is subject to substantial disagreement. Nevertheless, "you being wrong" is not the only reason for other people to disagree with you.
We already have the ability to ask why a comment is up or down voted. Because we currently have anonymity such questions can be asked without being a direct social challenge to those who voted. This cuts out all sorts of biases.
A counter-example to a straw man. (I agree and maintain my previous claim.)
If I understand the point in question it seems we are in agreement - voting is evidence about the reasoning of the voter which can in turn be evidence about the comment itself. In the case of downvotes (and this is where we disagree), I actually think it is better that we don't have access to that evidence. Mostly because down that road lies politics and partly because people don't all have the same criteria for voting. There is a difference between "I think the comment should be at +4 but it is currently at +6", "I think this comment contains bad reasoning", "this comment is on the opposing side of the argument", "this comment is of lesser quality than the parent and/or child" and "I am reciprocating voting behavior". Down this road lies madness.
I don't think we disagree substantially on this.
We do seem to have have a different picture of the the likely influence of public voting if it were to replace anonymous voting. From what you are saying part of this difference would seem to be due to differences in the way we account for the social influence of negative (and even just different) social feedback. A high priority for me is minimising any undesirable effects of (social) politics on both the conversation in general and on me in particular.
Pardon me. I deleted the grandparent planning to move it to a meta thread. The comment, fresh from the clipboard in the form that I would have re-posted, is this:
Thats ok. For what it is worth, while I upvoted your comment this time I'll probably downvote future instances of self-deprecation. I also tend to downvote people when they apologise for no reason. I just find wussy behaviors annoying. I actually stopped watching The Sorcerer's Apprentice a couple of times before I got to the end - even Nicholas Cage as a millenia old vagabond shooting lightening balls from his hands can only balance out so much self-deprecation from his apprentice.
Note that some instances of self-deprecation are highly effective and quite the opposite of wussy, but it is a fairly advanced social move that only achieves useful ends if you know exactly what you are doing.
For most people it is very hard not to question your own judgement when it is subject to substantial disagreement. Nevertheless, "you being wrong" is not the only reason for other people to disagree with you. Those biasses you mention are quite prolific.
We already have the ability to ask why a comment is up or down voted. Because we currently have anonymity such questions can be asked without being a direct social challenge to those who voted. This cuts out all sorts of biases and allows communication that would not be possible if votes were out there in the open.
A counter-example to a straw man. (I agree and maintain my previous claim.)
That could be considered a 'high quality problem'. That many people wishing to explore concepts related to improving rational thinking and behavior would be remarkable!
I do actually agree that the karma system could be better implemented. The best karma system I have seen was one in which the weight of votes depended on the karma of the voter. The example I am thinking of allocated weight vote according to 'rank' but when plotted the vote/voter.karma relationship would look approximately logarithmic. That system would be farm more stable and probably more useful than the one we have, although it is somewhat less simple.
Another feature that some sites have is making only positive votes public. (This is something that I would support.)
Where were the logarithmic karma system used?
The stability could be a problem in the moderately unlikely event that the core group is going sour and new members have a better grasp. I grant that it's more likely to have a lot of new members who don't understand the core values of the group.
I don't think it would be a problem to have a system which gives both the number and total karma-weight of votes.