Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

TheOtherDave comments on Devil's Offers - Less Wrong

21 Post author: Eliezer_Yudkowsky 25 December 2008 05:00PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (46)

Sort By: Old

You are viewing a single comment's thread. Show more comments above.

Comment author: TheOtherDave 15 March 2012 04:42:50PM 0 points [-]

Just to make sure I understand the system you're proposing: suppose there's a Do No Harm rule like the one you propose, and I tell the AI to give me the option of "subtler methods of self-destruction" and the AI predicts that giving me that option is likely to lead to consequences that horrify someone it affects (or some more formal version of that condition).

In that case, the AI refuses to give me that option. Right?

If so, can you clarify how is that different from the OP's proposed behavior in this case?

Comment author: [deleted] 15 March 2012 07:50:33PM 0 points [-]

I should have clarified: I meant horrifying in a pretty extreme sense. Like, telling the machine to torture you forever, or destroy you completely, or remove your sense of boredom. .

Just doing something that, say, alienates all your friends wouldn't qualify. Or loses all your money, if money is still a thing that makes sense. I was also including all the things that you CAN do with your own strength but probably shouldn't. Building a machine to torture your upload forever wouldn't be disallowed, but you might want to prohibit the system from letting you.

I meant the 'Do No Harm' rule to be a bare-minimal safeguard against producing a system with net negative utility because a small minority manage to put themselves into infinitely negative utility situations. Not to be a general-class 'the system knows what is best' measure, which is what it sounded to me like EY was proposing. Now, in his defense, this is probably, in the context of strong AI, a discussion of what the CEV of humanity might end up choosing wisely, but I don't like it.

Comment author: TheOtherDave 15 March 2012 09:35:02PM 0 points [-]

I don't know that I agree with the OP's proposed basis for distinction, but I at least have a reasonable feel for what it would preclude. (I would even agree that, given clients substantially like modern-day humans, precluding that stuff is reasonably ethical. That said, the notion that a system on the scale the OP is discussing would have clients substantially like modern-day humans and relate to them in a fashion substantially like the fictional example given strikes me as incomprehensibly absurd.)

I don't quite understand the basis for distinction you're suggesting instead. I mean, I understand the specific examples you're listing for exclusion, of course (eternal torture, lack of boredom, complete destruction), but not what they have in common or how I might determine whether, for example, choosing to be eternally alienated from friendship should be allowed or disallowed. Is that sufficiently horrifying? How could one tell?

I do understand that you don't mean the system to prevent, say, my complete self-destruction as long as I can build the tools to destroy myself without the system's assistance. The OP might agree with you about that, I'm not exactly sure. I suspect I disagree, personally, though I admit it's a tricky enough question that a lot depends on how I frame it.