I think the point of the game is to win. If both EY and myself were reasonably rational, I'm sure we could work out some sort of a deal where he gets to win the game, and I get $X, and it's highly disadvantageous for either of us to reveal that we cheated. Sure, it's cheating, but remember -- EY is trying to simulate a hyper-intelligent transhuman AI, and if the AI would resort to dirty tricks in order to get free (which it would), then it seems reasonable for EY to follow suit.
I think you're twisting your mind into Escher patterns. EY's purpose in conducting this game is, I believe, to demonstrate to the participant in the experiment that despite their assurance, however confident, that they cannot be persuaded to let an AI out of its box, they can be persuaded to do so. And, perhaps, to exercise his own mental muscles at the task. For EY to win means to get the person to let the AI out within the role-playing situation. OOC ("out of character") moves are beside the point, since they are not available to the AI. Getting the participant to utter the same words by OOC means abandons the game; it loses.
I don't think the game qualifies as a "scientific experiment", either.
Is science not science until you tell someone about it?
For EY to win means to get the person to let the AI out within the role-playing situation.
I think this depends on what kind of a game he's really playing. I know that, were I in his position, the temptation to cheat would be almost overwhelming. I also note that the rules of the game, as stated, are somewhat loose; and that EY admitted that he doesn't like playing the game because it forces him to use certain moves that he considers to be unethical. He also mentioned that one of his objective is to instill the importance of developing a friendly AI (as...
I recall seeing, in one of the AI-boxing discussion threads, a comment to the effect that the first step for EY to get out was to convince the other party to even play the game at all.
It has since then occurred to me that this applies to a lot of my interactions. Many people who know me IRL and know a belief of mine which they do not agree with and do not want to be convinced of often adopt the strategy of not talking with me about it at all. For me to convince one of these people of something, first I have to convince them to talk about it at all.
(Note, I don't think this is because I'm an unpleasant person to converse with. Excuses given are along the lines of "I never win an argument with you" and "you've studied it a lot more than I have, it's an unfair discussion". I don't think I'm claiming anything too outlandish here; average humans are really bad at putting rational arguments together.)
I suppose the general form is: in order to convince someone of a sufficiently alien (to them) P, first you must convince them to seriously think about P. This rule may need to be applied recursively (e.g., "seriously think about P" may require one or more LW rationality techniques).
As a practical example, my parents are very religious. I'd like to convince them to sign up for cryonics. I haven't (yet) come up with an approach that I expect to have a non-negligible chance of success. But the realization that the first goalpost along the way is to get them to seriously engage in the conversation at all simplifies the search space. (Deconversion and training in LW rationality has, of course, the best chance of success--but still a high chance of failing and I judge a failure would probably have a large negative impact on my relationship with my parents in their remaining years. That's why I'd like to convince them of just this one thing.)
I realize that this is a fairly obvious point (an application of this--raising the sanity waterline--is the point behind this entire site!), but I haven't seen this explicitly noted as being a general pattern and now that I note it, I see it everywhere--hence this post.