Well, the classical game theorist would reply that they're studying one-off games, in which the game you're currently playing doesn't affect any payoff you get outside that game (otherwise that should be made part of the game), so you can't be doing the punishment because you want to be known to be a punisher, or the game that Robin specified doesn't model the situation you're in. The classical game theorist assumes you can't look into people's heads, so whatever you say or do before the cheating, you're always free to not punish during the punishment round (as you're undoubtedly aware, mutual checking of source code is prohibited by antitrust laws in over 185 countries).
The classical game theorist would further point out that if you do want model that punishment helps you be known as a punisher, then you should use their theory of repeated games, where they have some folk theorems for you saying that lots and lots of things can be Nash equilibria e.g. in a game where after each round there is a fixed probability of another round; for example, cooperation in the prisoner's dilemma, but also all sorts of suboptimal outcomes (which become Nash equilibria because any deviator gets punished as badly as the other players can punish them).
I should point out that not all classical game theorists think that SPE makes particularly good predictions, though; I've read someone say, I think Binmore, that you expect to virtually always see a NE in the laboratory after a learning period, but not an SPE, and that the original inventor of SPE actually came up with it as an example of what you would not expect to see in the lab, or something to that tune. (Sorry, I should really chase down that reference, but I don't have time right now. I'll try to remember to do that later. ETA: Ok, Binmore and Shaked, 2010: Experimental Economics: Where Next? Journal of Economic Behavior & Organization, 73: 87-100. See the stuff about backward induction, starting at the bottom on p.88. The inventor of SPE is Reinhard Selten, and the claim is that he didn't believe it would predict what you see it in the lab and "[i]t was to demonstrate this fact that he encouraged Werner Güth (...) to carry out the very first experiment on the Ultimatum game", not that he invented SPE for this purpose.)
so whatever you say or do before the cheating, you're always free to not punish during the punishment round
Interesting. This idea, used as an argument for SPE, seems to be the free will debate intruding into decision theory. "Only some of these algorithms have freedom, and others don't, and humans are free, so they should behave like the free algorithms." This either ignores, or accepts, the fact that the "free" algorithms are just as deterministic as the "unfree" algorithms. (And it depends on other stuff, but that's ...
Let us start with a (non-quantum) logical coinflip - say, look at the heretofore-unknown-to-us-personally 256th binary digit of pi, where the choice of binary digit is itself intended not to be random.
If the result of this logical coinflip is 1 (aka "heads"), we'll create 18 of you in green rooms and 2 of you in red rooms, and if the result is "tails" (0), we'll create 2 of you in green rooms and 18 of you in red rooms.
After going to sleep at the start of the experiment, you wake up in a green room.
With what degree of credence do you believe - what is your posterior probability - that the logical coin came up "heads"?
There are exactly two tenable answers that I can see, "50%" and "90%".
Suppose you reply 90%.
And suppose you also happen to be "altruistic" enough to care about what happens to all the copies of yourself. (If your current system cares about yourself and your future, but doesn't care about very similar xerox-siblings, then you will tend to self-modify to have future copies of yourself care about each other, as this maximizes your expectation of pleasant experience over future selves.)
Then I attempt to force a reflective inconsistency in your decision system, as follows:
I inform you that, after I look at the unknown binary digit of pi, I will ask all the copies of you in green rooms whether to pay $1 to every version of you in a green room and steal $3 from every version of you in a red room. If they all reply "Yes", I will do so.
(It will be understood, of course, that $1 represents 1 utilon, with actual monetary amounts rescaled as necessary to make this happen. Very little rescaling should be necessary.)
(Timeless decision agents reply as if controlling all similar decision processes, including all copies of themselves. Classical causal decision agents, to reply "Yes" as a group, will need to somehow work out that other copies of themselves reply "Yes", and then reply "Yes" themselves. We can try to help out the causal decision agents on their coordination problem by supplying rules such as "If conflicting answers are delivered, everyone loses $50". If causal decision agents can win on the problem "If everyone says 'Yes' you all get $10, if everyone says 'No' you all lose $5, if there are conflicting answers you all lose $50" then they can presumably handle this. If not, then ultimately, I decline to be responsible for the stupidity of causal decision agents.)
Suppose that you wake up in a green room. You reason, "With 90% probability, there are 18 of me in green rooms and 2 of me in red rooms; with 10% probability, there are 2 of me in green rooms and 18 of me in red rooms. Since I'm altruistic enough to at least care about my xerox-siblings, I calculate the expected utility of replying 'Yes' as (90% * ((18 * +$1) + (2 * -$3))) + (10% * ((18 * -$3) + (2 * +$1))) = +$5.60." You reply yes.
However, before the experiment, you calculate the general utility of the conditional strategy "Reply 'Yes' to the question if you wake up in a green room" as (50% * ((18 * +$1) + (2 * -$3))) + (50% * ((18 * -$3) + (2 * +$1))) = -$20. You want your future selves to reply 'No' under these conditions.
This is a dynamic inconsistency - different answers at different times - which argues that decision systems which update on anthropic evidence will self-modify not to update probabilities on anthropic evidence.
I originally thought, on first formulating this problem, that it had to do with double-counting the utilons gained by your variable numbers of green friends, and the probability of being one of your green friends.
However, the problem also works if we care about paperclips. No selfishness, no altruism, just paperclips.
Let the dilemma be, "I will ask all people who wake up in green rooms if they are willing to take the bet 'Create 1 paperclip if the logical coinflip came up heads, destroy 3 paperclips if the logical coinflip came up tails'. (Should they disagree on their answers, I will destroy 5 paperclips.)" Then a paperclip maximizer, before the experiment, wants the paperclip maximizers who wake up in green rooms to refuse the bet. But a conscious paperclip maximizer who updates on anthropic evidence, who wakes up in a green room, will want to take the bet, with expected utility ((90% * +1 paperclip) + (10% * -3 paperclips)) = +0.6 paperclips.
This argues that, in general, decision systems - whether they start out selfish, or start out caring about paperclips - will not want their future versions to update on anthropic "evidence".
Well, that's not too disturbing, is it? I mean, the whole anthropic thing seemed very confused to begin with - full of notions about "consciousness" and "reality" and "identity" and "reference classes" and other poorly defined terms. Just throw out anthropic reasoning, and you won't have to bother.
When I explained this problem to Marcello, he said, "Well, we don't want to build conscious AIs, so of course we don't want them to use anthropic reasoning", which is a fascinating sort of reply. And I responded, "But when you have a problem this confusing, and you find yourself wanting to build an AI that just doesn't use anthropic reasoning to begin with, maybe that implies that the correct resolution involves us not using anthropic reasoning either."
So we can just throw out anthropic reasoning, and relax, and conclude that we are Boltzmann brains. QED.
In general, I find the sort of argument given here - that a certain type of decision system is not reflectively consistent - to be pretty damned compelling. But I also find the Boltzmann conclusion to be, ahem, more than ordinarily unpalatable.
In personal conversation, Nick Bostrom suggested that a division-of-responsibility principle might cancel out the anthropic update - i.e., the paperclip maximizer would have to reason, "If the logical coin came up heads then I am 1/18th responsible for adding +1 paperclip, if the logical coin came up tails then I am 1/2 responsible for destroying 3 paperclips." I confess that my initial reaction to this suggestion was "Ewwww", but I'm not exactly comfortable concluding I'm a Boltzmann brain, either.
EDIT: On further reflection, I also wouldn't want to build an AI that concluded it was a Boltzmann brain! Is there a form of inference which rejects this conclusion without relying on any reasoning about subjectivity?
EDIT2: Psy-Kosh has converted this into a non-anthropic problem!