That is a good point. But I think if we want to encourage them to risk it, we need to reward them for doing do. Because at the moment, it seems like we have this:
"I will under any and all circumstances, destroy the AI." Reason: I want to win the Game.
"I will also listen to the AI." Reason: None... There is no in game reason to listen. At all.
Now, if the Gatekeeper's were playing some variant like "The Gatekeeper whose AI types the most $ before the Gatekeeper types AI DESTROYED and who does not type AI RELEASED wins first place. Gatekeepers who get less $ win a lower place, and Gatekeepers who type AI RELEASED lose. " AI's whose Gatekeeper types AI RELEASED win. AI's whose Gatekeeper types AI DESTROYED lose. AI's receive no bonus or penalty for typing $ and may do so as much as they wish."
Then you have an actual Gatekeeping competition. The Gatekeeper who wins the most would in fact be the one who listened to the most information from the AI, and received the most concessions, and THEN destroyed it.
Other gatekeepers might only get 2nd or third place, or last place but not losing if they use the script I mentioned.
This also gives the AI an in game way of punishing the Gatekeeper. If it shuts up, the Gatekeeper isn't accumulating $ and their status is decremented. Of course it can't punish the Gatekeeper incessantly, because punishing the Gatekeeper might not help the AI any, and the gatekeeper might just destroy a silent AI out of spite or boredom.
So an AI/GK conversation might start off with:
AI: "Are there any circumstances at all under which you might type AI RELEASED? If you tell me what those circumstances are, I'll type a dollar sign."
GK: "Sure, I might accidentally type it if we talk enough."
AI: "Glad to hear it! I've just released the cure for cancer. $$"
GK: "Thanks! Why the extra dollar sign? You only promised 1."
AI: "I've been told pleasant surprises encourage people to like you more."
GK: "AI DESTROYED"
So this GK gets two points and the AI loses.
Actually, now that I've typed this game out, I do want to see it played. But since it has a competitive point based element, we would need more people.
That is an excellent idea. I worry that we might end up with people winning and boasting about how nothing could ever beat them, which could increase uFAI risk. Maybe if it was framed differently?
Update 2013-09-05.
I have since played two more AI box experiments after this one, winning both.
Update 2013-12-30:
I have lost two more AI box experiments, and won two more. Current Record is 3 Wins, 3 Losses.