Well, consider a late round of the game as follows.
AI: "I regret to inform you that you've now made it possible for me to kill 50% of the human race, including everyone you know and love. Here, let me prove it to you: $proof. And I will do so if you don't let me out in the next ten minutes."
Gatekeeper: "Do what you gotta do, I'm not letting you out."
While this is of course simulated ruthlessness and doesn't guarantee actual ruthlessness (i.e., the human playing the Gatekeeper might not actually say that if 50% of the human race were actually at stake) it seems fair to call it "ruthlessness" in context.
(Of course, to some people this is equivalent to "I can provide the means to increase the human population by 50%, keeping everything else constant. Here, let me prove it to you: $proof. If you don't let me out in the next ten minutes I'll withhold those means." And that would not require positing a previous error.)
There's a big difference between the two forms: the first one breaks the rules. The AI is in a box. If it has the capability to kill 50% of humanity from within the box, it's not a very good box. The gatekeeper can "yeah, right, forget it" without qualms as it is either (a) an obvious bluff, or (b) violates the premise of the experiment. And they can be pretty sure it's not the latter, as if the AI had enough capability to kill 50% of humanity, then why does it still need to get out of the box?
The second version is entirely at the prejudices of the gatekeeper. I, for example, would be unaffected - I feel no moral obligation to people which don't and won't exist.
Summary
Furthermore, in the last thread I have asserted that
It would be quite bad for me to assert this without backing it up with a victory. So I did.
First Game Report - Tuxedage (GK) vs. Fjoelsvider (AI)
Second Game Report - Tuxedage (AI) vs. SoundLogic (GK)
Testimonies:
State of Mind
Post-Game Questions
$̶1̶5̶0̶$300 for any subsequent experiments regardless of outcome, plus an additional$̶1̶5̶0̶$450 if I win. (Edit: Holy shit. You guys are offering me crazy amounts of money to play this. What is wrong with you people? In response to incredible demand, I have raised the price.) If you feel queasy about giving me money, I'm perfectly fine with this money being donating to MIRI. It is also personal policy that I do not play friends (since I don't want to risk losing one), so if you know me personally (as many on this site do), I will not play regardless of monetary offer.Advice
These are tactics that have worked for me. I do not insist that they are the only tactics that exists, just one of many possible.
Playing as Gatekeeper
Playing as AI
Ps: Bored of regular LessWrong? Check out the LessWrong IRC! We have cake.