The first one doesn't break the rules, it just assumes an earlier gatekeeper error that allows the AI to kill people... e.g., the gatekeeper accepts a cancer vaccine that turns out to kill half the population unless treated in a particular way by the AI, or whatever. That's why I said a late round. And, yes, it's just as possible (if not more so) that no such error will ever be made. I wasn't trying to make a general prediction about how AI boxes fail or succeed, I was answering the question about under what circumstances a gatekeeper's ruthlessness might be relevant.
And, sure, if you don't care about hypothetical future people, then offers to create them won't motivate you. For that matter, if you don't care about currently living people, threats to kill them won't motivate you either. (Some people might call that ruthlessness; others would not.)
If you're going to let the AI do that, then you have to give the same advantage to the gatekeeper.
The premise is we tasked the oracle AI with coming up a cancer vaccine, and it gave us one which secretly was a ticking time-bomb. Well, the gatekeeper should be allowed to setup operational security as well. All proposals coming out of the AI interface are evaluated by double-blind teams of domain experts who don't know the source or stated purpose of the proposal. No aspect of any AI-generated proposal is implemented until multiple independent evaluation tea...
Summary
Furthermore, in the last thread I have asserted that
It would be quite bad for me to assert this without backing it up with a victory. So I did.
First Game Report - Tuxedage (GK) vs. Fjoelsvider (AI)
Second Game Report - Tuxedage (AI) vs. SoundLogic (GK)
Testimonies:
State of Mind
Post-Game Questions
$̶1̶5̶0̶$300 for any subsequent experiments regardless of outcome, plus an additional$̶1̶5̶0̶$450 if I win. (Edit: Holy shit. You guys are offering me crazy amounts of money to play this. What is wrong with you people? In response to incredible demand, I have raised the price.) If you feel queasy about giving me money, I'm perfectly fine with this money being donating to MIRI. It is also personal policy that I do not play friends (since I don't want to risk losing one), so if you know me personally (as many on this site do), I will not play regardless of monetary offer.Advice
These are tactics that have worked for me. I do not insist that they are the only tactics that exists, just one of many possible.
Playing as Gatekeeper
Playing as AI
Ps: Bored of regular LessWrong? Check out the LessWrong IRC! We have cake.