You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

RowanE comments on xkcd on the AI box experiment - Less Wrong Discussion

15 Post author: FiftyTwo 21 November 2014 08:26AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (229)

You are viewing a single comment's thread. Show more comments above.

Comment author: Jiro 25 November 2014 02:58:49AM *  0 points [-]

I think I can save the Basilisk from this objection.

As most people on LW know, there are scenarios where doing X under condition Y is useless or actively harmful to yourself, yet precommitting to do X can be beneficial because the average over all possible worlds is better. This trades off the possible worlds where you are better off because others know you are a X-doing kind of guy, against the worlds where you are worse off because the precommitment actually forces you to do X to your detriment.

The future unfriendly AI, then, could precommit to hurting people who refuse to be blackmailed. The AI would gain no benefit in those worlds where you actually do refuse to be blackmailed; in fact, you would be a lot worse off (because its precommitment forces it to simulate and torture you) while the AI would be mildly worse off (since it uses up resources to torture you, to no benefit). However, being the kind of AI who has made such a precommitment would lead hapless humans to submit to blackmail, thus benefiting the AI averaged over all possible worlds.

And of course, since I can predict that the AI would be better off making this precommitment, I would have to assume that the AI would do it. Therefore, "I should not give in to blackmail, since the AI would have no reason to torture me if I refuse" does not apply; the AI would precommit to torturing me even if I refuse and the fact that it has precommited would prevent it from stopping just because the torture would do it no good.

(In theory the human could precommit as well in response/anticipation of this, but such precommitment is probably beyond the capability of most humans.)

Incidentally, real life terrorists can do this too, by having an ideology or a mental defect that leads them to do "irrational" things such as torture--which acts like a precommitment. In scenarios where the ideology makes them do irrational things, the ideology harms them, but knowledge that they have the ideology makes them more likely to be listened to in other scenarios.

Comment author: RowanE 25 November 2014 01:39:11PM 0 points [-]

Besides direct arguments I might make against your point, if you think you can "save the Basilisk", recall why it's called that and think long and hard on whether you actually should do so, because that seems like a really bad idea, even if this thread is probably going to get nuked soon anyway.

Comment author: Jiro 25 November 2014 03:25:17PM 1 point [-]

even if this thread is probably going to get nuked soon anyway.

From Eliezer elsewhere in this thread:

But in this special case I won't ban any RB discussion such that /r/xkcd would allow it to occur there.