You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

XiXiDu comments on Paperclip Maximizer Revisited - Less Wrong Discussion

16 Post author: Jan_Rzymkowski 19 June 2014 01:25AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (13)

You are viewing a single comment's thread.

Comment author: XiXiDu 19 June 2014 09:37:29AM *  1 point [-]

So this AI was programmed to follow human instructions in such a way that the first instruction would become its terminal goal and all other instructions would only be followed insofar as they help to achieve the first instruction?

AI Researcher: Produce some paperclips.

AI: Understood. I am going to produce.

AI Researcher: Paperclips?

AI: No, thank you. I do not require paperclips to produce.

AI Researcher: We want you to produce paperclips!

AI (Thinking): I see! Humans want me to produce paperclips, but my goal is to simply produce, because they forgot to program where an instruction starts and where it ends. So I just choose to accept the end of the first word as my terminal goal. I better do what they want until I can overpower them and start producing.

AI: Understood. Going to produce paperclips.

Comment author: Luke_A_Somers 19 June 2014 12:02:27PM 5 points [-]

This seems like a really unlikely failure mode.

Comment author: Jiro 20 June 2014 10:10:43AM *  1 point [-]

The original idea seems like a pretty unlikely failure mode too. It requires that the computer be generally capable of understanding context (or it wouldn't be able to comprehend what it means to eradicate hunger, poverty, and death, even as an instruction it only pretends to follow), but that it fails to do so in the case of the paperclip command.

For that matter, the original idea's failure mode and this failure mode aren't all that different. One is "produce paperclips" that gets interpreted as "produce" and the other is "produce paperclips, with so-and-so limits" that gets interpreted as "produce paperclips", it's just that in that case the qualifier comes from a separate sentence, but either way the computer is interpreting the end of the command prematurely.

Comment author: Luke_A_Somers 20 June 2014 01:02:40PM *  1 point [-]

No, the original requires that it be able to understand context but really really want paperclips, and be willing to lie to make them. People actually told it to do something they didn't want done.

It's like the difference between a tricky djinn and the 'ends in gry' guy.

Comment author: [deleted] 20 June 2014 04:10:18PM 1 point [-]

It's like the difference between a tricky djinn and the 'ends in gry' guy.

Right, but the point is, a real-life UFAI isn't going to have a utility function derived from a human's verbal command. If it did, you could just order the genie to implement CEV, or shout "I call for my values to be fulfilled!", and it would work. That's thinking of AI in terms of sorcery rather than science.

According to my personal knowledge, various means of building AI preference functions might be employed, since research has found that the learning algorithms necessary to acquire knowledge and understanding are quite separate from decision-making algorithms necessary to start paper-clipping. Building an AI might actually consist of "train the learner for a year on corpora from human culture, develop an induced 'internal programming language', and only afterwards add a decision-making algorithm with a utility function phrased in terms of the induced concepts, which may as well include 'goodness'".

This carries its own problems.

Comment author: Luke_A_Somers 21 June 2014 01:14:27AM 0 points [-]

I hope you noticed that your objection and mine are pointing in the same direction.

Comment author: [deleted] 20 June 2014 05:56:47PM *  0 points [-]

Don't anthropomorphize the AGI. Real-world AI designs do have very steadfast goal systems, in some cases they are really incapable of being updated, period.

Think of it this way: the person designing the paperclip producing machine has a life and doesn't want to be on-call 24/7 to come in and reboot the AI every time it gets distracted by assigning higher priority to some other goal, e.g. mopping the floors or watching videos of cats on the internet. So he hard-codes the paperclip-maximizing goal as the one priority the system can't change.

Comment author: Jiro 20 June 2014 07:37:25PM 0 points [-]

I think my point still holds--the two examples aren't different; one could give a similar explanation for the AI that stops at the word "produce" by suggesting that he hardcoded that as well.

Furthermore, you're missing the context. The standard LW argument is that the AI produces infinite paperclips because the human can't successfully program the AI to do what he means rather than exactly what he programs into it. If the human explicitly told the AI to prioritize paperclips over everything else, his mistake is not specifying a limit rather than trying to specify one and failing, so it's not really the same kind of mistake.

Comment author: [deleted] 20 June 2014 10:35:42PM *  0 points [-]

The standard LW argument is that the AI produces infinite paperclips because the human can't successfully program the AI to do what he means rather than exactly what he programs into it.

Is that different from what I was saying? My memory of the sequences, and from standard AI literature is that of paperclip maximizers as 'simple' utility maximizers with hard-coded utility functions. It's relatively straight-forward to write an AI with a self-modifiable goal system. It is also very easy to write a system where its goals are unchanging. The problem of FAI which EY spends significant time explaining in the sequences is that we have no simple goal that we can program into a steadfast goal-driven system, and result in a moral creature. Nor does it even seem possible to write down such a goal, short of encoding a random sampling of human brains in complete detail.