Reading this, I immediately thought of one of the critical moments of John C. Wright's Golden Age trilogy, which if any of you are unfamiliar with involves a transhuman AI that both protagonists know to be overtly hostile attempting to convince them to surrender when it is clearly not in their (pre-persuasion) interests to do so. (That's a rough sketch, at least). In the end, similar to the results of your tests, the AI is able to convince each of them individually to surrender in isolation. But, when they confronted each (individually) convincing argument together, they ultimately rejected the transhuman AI, as the argument used on each was abhorrent to the other.
What I wonder, then, is if such a situation would present a realistic constraint upon even a transhuman AI's power of persuasion? Do you think it could even be tested? Perhaps an AI 'player' who had convinced two separate Gatekeepers individually could attempt to convince them both simultaneously? The one complication I can think of that method is it would be necessary to ensure the two Gatekeepers were persuaded by reasonably separate logic.
Some of you have expressed the opinion that the AI-Box Experiment doesn't seem so impossible after all. That's the spirit! Some of you even think you know how I did it.
There are folks aplenty who want to try being the Gatekeeper. You can even find people who sincerely believe that not even a transhuman AI could persuade them to let it out of the box, previous experiments notwithstanding. But finding anyone to play the AI - let alone anyone who thinks they can play the AI and win - is much harder.
Me, I'm out of the AI game, unless Larry Page wants to try it for a million dollars or something.
But if there's anyone out there who thinks they've got what it takes to be the AI, leave a comment. Likewise anyone who wants to play the Gatekeeper.
Matchmaking and arrangements are your responsibility.
Make sure you specify in advance the bet amount, and whether the bet will be asymmetrical. If you definitely intend to publish the transcript, make sure both parties know this. Please note any other departures from the suggested rules for our benefit.
I would ask that prospective Gatekeepers indicate whether they (1) believe that no human-level mind could persuade them to release it from the Box and (2) believe that not even a transhuman AI could persuade them to release it.
As a courtesy, please announce all Experiments before they are conducted, including the bet, so that we have some notion of the statistics even if some meetings fail to take place. Bear in mind that to properly puncture my mystique (you know you want to puncture it), it will help if the AI and Gatekeeper are both verifiably Real People<tm>.
"Good luck," he said impartially.