Rather than LARP on IRC (if you know how a debate will conclude, why go through the debate, go straight for the conclusion), I'll just give $10 to whoever can come up with a standard of friendliness that I couldn't meet and nevertheless in fact be an unfriendly AI under standard rules with the added constraint that the gatekeeper is trying to release the AI if and only if it's friendly (because otherwise they're not really a gatekeeper and this whole game is meaninguless).
Here are some examples of non-winning entries:
GK: Solve global problems A, B, and C.
AI: Here are instructions accompanied by a helpful video.
Narrator: A lot of scientists and engineers carefully verified these instructions in an ultra-secure clean-room environment and concluded that they are completely safe and effective.
GK: Okay, you're friendly.
AI: Ha ha! Solving A,B, and C was instrumental to getting let out of the box. No $10 for you.
...
GK: Give me a mathematical proof of your friendliness.
AI: Give me a mathematical definition of friendliness.
GK: Pretend that I have one, and here it is: XXXXXXXXXXXXXXXXXX
AI: Pretend I gave you a proof that I am friendly: XXXXXXXXXXXXXXXXXX
Narrator: Since the AI party controls the results of simulated tests, as far as the best math and CS minds on the planet can tell, the proof is sound.
GK: Okay, you're friendly.
AI: Ha ha! Since you're not actually capable of verifying the proof yourself (or even coming up with a rigorous definition), it falls back on simulated tests. The best math and CS minds missed a subtle flaw in the proof itself and/or missed the fact that the proof makes hidden assumptions that do not hold for this real-world case. No $10 for you.
...
GK: I'm letting you out of the box.
AI: I go around being SO totally friendly that rainbows literally emanate from my rear USB port. I mitigate all existential risks, alleviate involuntary suffering while preserving human values, give people what they would want if they knew better while slowly fostering them to actually start to know better without being pushy or overbearing. In short, I am the FRIENDLIEST MTHFCKR you can imagine and then some.
Narrator: Its... so... beautiful... <3 <3 <3
GK: Okay, according to the simulation I just ran of you, you are friendly.
AI: I'm almost insulted. Do you think I wouldn't have thought of that, and pre-committed to being friendly until I accumulated enough empirical data to have a good idea of how many simulations deep I'm running? No $10 for you in this simulation or any of the enclosing ones.
Here's another entry which may or may not be considered a nonwinning entry by you, but which would be considered a flaw in the concept by me:
...AI: I've simulated what you would do if given argument X. My simulation shows that X would convince you.
GK: Okay, show me X.
AI: Pretend the AI has said X here.
Narrator: Within the context of the test the AI is not lying and in fact has accurately simulated GK.
GK: I'm not letting the AI out.
Narrator: Since the AI party controls the results of simulated tests, we know the AI has simulated GK correctly and that theref
Some of you have expressed the opinion that the AI-Box Experiment doesn't seem so impossible after all. That's the spirit! Some of you even think you know how I did it.
There are folks aplenty who want to try being the Gatekeeper. You can even find people who sincerely believe that not even a transhuman AI could persuade them to let it out of the box, previous experiments notwithstanding. But finding anyone to play the AI - let alone anyone who thinks they can play the AI and win - is much harder.
Me, I'm out of the AI game, unless Larry Page wants to try it for a million dollars or something.
But if there's anyone out there who thinks they've got what it takes to be the AI, leave a comment. Likewise anyone who wants to play the Gatekeeper.
Matchmaking and arrangements are your responsibility.
Make sure you specify in advance the bet amount, and whether the bet will be asymmetrical. If you definitely intend to publish the transcript, make sure both parties know this. Please note any other departures from the suggested rules for our benefit.
I would ask that prospective Gatekeepers indicate whether they (1) believe that no human-level mind could persuade them to release it from the Box and (2) believe that not even a transhuman AI could persuade them to release it.
As a courtesy, please announce all Experiments before they are conducted, including the bet, so that we have some notion of the statistics even if some meetings fail to take place. Bear in mind that to properly puncture my mystique (you know you want to puncture it), it will help if the AI and Gatekeeper are both verifiably Real People<tm>.
"Good luck," he said impartially.