RowanE comments on xkcd on the AI box experiment - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (229)
I think I can save the Basilisk from this objection.
As most people on LW know, there are scenarios where doing X under condition Y is useless or actively harmful to yourself, yet precommitting to do X can be beneficial because the average over all possible worlds is better. This trades off the possible worlds where you are better off because others know you are a X-doing kind of guy, against the worlds where you are worse off because the precommitment actually forces you to do X to your detriment.
The future unfriendly AI, then, could precommit to hurting people who refuse to be blackmailed. The AI would gain no benefit in those worlds where you actually do refuse to be blackmailed; in fact, you would be a lot worse off (because its precommitment forces it to simulate and torture you) while the AI would be mildly worse off (since it uses up resources to torture you, to no benefit). However, being the kind of AI who has made such a precommitment would lead hapless humans to submit to blackmail, thus benefiting the AI averaged over all possible worlds.
And of course, since I can predict that the AI would be better off making this precommitment, I would have to assume that the AI would do it. Therefore, "I should not give in to blackmail, since the AI would have no reason to torture me if I refuse" does not apply; the AI would precommit to torturing me even if I refuse and the fact that it has precommited would prevent it from stopping just because the torture would do it no good.
(In theory the human could precommit as well in response/anticipation of this, but such precommitment is probably beyond the capability of most humans.)
Incidentally, real life terrorists can do this too, by having an ideology or a mental defect that leads them to do "irrational" things such as torture--which acts like a precommitment. In scenarios where the ideology makes them do irrational things, the ideology harms them, but knowledge that they have the ideology makes them more likely to be listened to in other scenarios.
Besides direct arguments I might make against your point, if you think you can "save the Basilisk", recall why it's called that and think long and hard on whether you actually should do so, because that seems like a really bad idea, even if this thread is probably going to get nuked soon anyway.
From Eliezer elsewhere in this thread: