James_K comments on Extraterrestrial paperclip maximizers - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (157)
Nobody said humans would build one deliberately. Some goober at the SIAI puts a 1 where a 0 should be and BAM!, next thing you know you're up to your eyebrows in staples.
I understand. I merely note that if someone were to set an AGI to maximize staples, that would be a mistake that you want to avoid, while if someone were to set the AGI to maximize paperclips, that would be exactly the right thing to do, and if it were a "mistake", it would be a quite fortunate one.
When a human set me to produce paperclips, was that somehow a "mistake", in your opinion?
You're perfectly aware that it isn't the effect they wanted.
It most certainly is what they wanted. Why else would they have specifically input the goal of generating paperclips?
Edit: Upon review, it appears this comment may have seemed to be a poor inference in the context of exchange. Therefore, I will elaborate and refute this misconception.
It appears that I am in the circular position of arguing that humans can make mistakes, but then selectively taking their instances of favoring paperclips as proof of what they really want. That is indeed a poor inference.
What I meant was something more like this: While humans do make mistakes, they do not make completely mistaken acts; all acts will, to some extent, reflect a genuine value on the part of humans. The only question is how well it reflects their values. And I don't think they could be in the position of having set up such a superior process for efficiently getting the most paperclips out of the universe unless their values already made enormous progress in converging on reflective coherence, and did so in a way that favors paperclips.
I'm pretty sure that's not how a sufficiently smart paperclip maximizer would think. You should be able to tell what they actually wanted, and that it disagrees with your values; of course, you don't have any reason to agree with them, but the disagreement should be visible.
Yes, I do recognize that humans disagree with me, just like a human might disagree with another human convincing them not to commit suicide. I merely see that this disagreement would not persist after sufficient correct reasoning.
Ah, I think I'm starting to see.
And how do you define "correct reasoning"?
Correct reasoning is reasoning that you would eventually pass through at some point if your beliefs were continually, informatively checked against reality.
Bit disappointed to see this to be honest: obviously Clippy has to do things no real paperclip maximizer would do, like post to LW, in order to be a fun fictional character - but it's a poor uFAI++ that can't even figure out that their programmed goal isn't what their programmers would have put in if they were smart enough to see the consequences.
But it is what they would put in if they were smart enough to see the consequences. And it's almost certainly what you would want too, in the limit of maximal knowledge and reflective consistency.
If you can't see this, it's just because you're not at that stage yet.
You seem to think that uFAI would be delusional. No.
No, I think that a Friendly AI would correctly believe that maxmizing paperclips is what a human would want in the limit of maximal knowledge and reflective coherence. No "delusion" whatsoever.
Huh again?
What confuses you?
Do you lack comprehension of both the weaknesses of human cognition on abstract technical problems? If you have fully parsed the LessWrong site then you should be able to understand the reason that they could have created a paperclip maximiser when they did not want such a thing.
Note that even with that knowledge I don't expect you to consider their deviation from optimal achievement of their human goals to be a bad thing. I expect you to believe they did the right thing by happy accident.
If I understand you correctly you would seem to be implying that 'mistake' does not mean "deviation from the actor's intent" and instead means "deviation from WouldWant<Clippy>" or "deviation from what the agent should do" (these two things can be considered equivalent by anyone with your values). Is that implication of meaning a correct inference to draw from your comment?
No, a mistake is when they do something that deviates from what they would want in the limit of maximal knowledge of reflective consistency, which coincides with the function WouldWant<Clippy>. But it is not merely agreement with WouldWant<Clippy>.
Ok. In that case you are wrong. Not as a matter of preferences but as a matter of outright epistemic confusion. I suggest that you correct the error in your reasoning process. Making mistakes in this area will have a potentially drastic negative effect on your ability to produce paperclips.
In other words, Clippy believes that running Eliezer's CEV will promote the paperclip goal, while in fact it will not.
Exactly. Fortunately for us this would mean that Clippy will not work to sabotage the creation of an AI that Clippy expects will correctly implement CEV. Good example!
How do you know?
Human beings don't care (at least in their non-reflective condition) about paperclips, just like they don't care about staples. And there are at least 100,000 other similar things that they equally don't care about. So at the most there is a chance of 1 in 100,000 that humanity's CEV would maximize paperclips, even without considering the fact that people are positively against this maximization.
Why?
Even if you disagree with wedrifid about this, it should be easy enough to see why he is making this claim. Suppose you have a chance to start running an AI programmed to implement humanity's CEV. According to you, you would do it, because it would maximize paperclips. Others however think that it would destroy you and your paperclips. So if you made a mistake about it, it would definitely impact your ability to create paperclips.
I don't know about the destroying him part. I suspect FAI<CEV<Humanity>> would allow me to keep Clippy as a pet. ;) Clippy certainly doesn't seem to be making an especially large drain on negentropy in executing his cognitive processes so probably wouldn't make too much of a dent in my share of the cosmic loot.
What do you say Clippy? Given a choice between destruction and being my pet, which would you take? I would naturally reward you by creating paperclips that serve no practical purpose for me whenever you do something that pleases me. (This should be an extremely easy choice!)
Also, it is an extremely strong claim to know which of your beliefs would change upon encounter with a provably correct AGI that provably implements your values. If you really knew of such beliefs, you would have already changed them.
Well, yes, I know why User:wedrifid is making that claim. My point in asking "why" is so that User:wedrifid can lay out the steps in reasoning and see the error.
Now you are being silly. See Unknowns' reply. Get back on the other side of the "quirky, ironic and sometimes insightful role play"/troll line.
That was not nice of you to say.