There's a big difference between the two forms: the first one breaks the rules. The AI is in a box. If it has the capability to kill 50% of humanity from within the box, it's not a very good box. The gatekeeper can "yeah, right, forget it" without qualms as it is either (a) an obvious bluff, or (b) violates the premise of the experiment. And they can be pretty sure it's not the latter, as if the AI had enough capability to kill 50% of humanity, then why does it still need to get out of the box?
The second version is entirely at the prejudices of the gatekeeper. I, for example, would be unaffected - I feel no moral obligation to people which don't and won't exist.
The first one doesn't break the rules, it just assumes an earlier gatekeeper error that allows the AI to kill people... e.g., the gatekeeper accepts a cancer vaccine that turns out to kill half the population unless treated in a particular way by the AI, or whatever. That's why I said a late round. And, yes, it's just as possible (if not more so) that no such error will ever be made. I wasn't trying to make a general prediction about how AI boxes fail or succeed, I was answering the question about under what circumstances a gatekeeper's ruthlessness might ...
Summary
Furthermore, in the last thread I have asserted that
It would be quite bad for me to assert this without backing it up with a victory. So I did.
First Game Report - Tuxedage (GK) vs. Fjoelsvider (AI)
Second Game Report - Tuxedage (AI) vs. SoundLogic (GK)
Testimonies:
State of Mind
Post-Game Questions
$̶1̶5̶0̶$300 for any subsequent experiments regardless of outcome, plus an additional$̶1̶5̶0̶$450 if I win. (Edit: Holy shit. You guys are offering me crazy amounts of money to play this. What is wrong with you people? In response to incredible demand, I have raised the price.) If you feel queasy about giving me money, I'm perfectly fine with this money being donating to MIRI. It is also personal policy that I do not play friends (since I don't want to risk losing one), so if you know me personally (as many on this site do), I will not play regardless of monetary offer.Advice
These are tactics that have worked for me. I do not insist that they are the only tactics that exists, just one of many possible.
Playing as Gatekeeper
Playing as AI
Ps: Bored of regular LessWrong? Check out the LessWrong IRC! We have cake.