Consider the Prisoner's Dilemma, modified so that one person moves first and the other person gets to observe their move before choosing.
Obviously the classically correct first move is to defect first. Thus the second player will never have to deal with a move of Cooperate.
Therefore if a move of Cooperate is made, the second player's move is classically undefined (if one accepts the logic of this post). And yet, if both players play cooperate it's better than (D,D), and so which move gets made first depends on the actions of the second player if Cooperate is played first. Therefore, this prisoner's dilemma has no solution.
(I consider this to be a reductio).
Viewed this way, there's an obvious relationship with the formal-agent problem of "I can prove what option is best, and I know I'll take the best option - therefore, if I do something else all logical statements are conditioning on a falsehood, and so it's true that I can get the best results by doing nothing." The solution there is to not use logical conditioning inside the decision-making process like that, and instead use causal insertion
Similarly, we might never expect the second player in an ordered Prisoner's Dil...
Classical game theory says that player 1 should chose A for expected utility 3, as this is better than than the sub game of choosing between B and C where the best player 1 can do against a classically rational player 2 is to play B with probability 1/3 and C with probability 2/3 (and player 2 plays X with probability 2/3 and Y and with probability 1/3), for an expected value of 2.
But, there are pareto improvements available. Player 1's classically optimal strategy gives player 1 expected utility 3 and player 2 expected utility 0. But suppose instead Player 1 plays C, and player 2 plays X with probability 1/3 and Y with probability 2/3. Then the expected utility for player 1 is 4 and for player 2 it is 1/3. Of course, a classically rational player 2 would want to play X with greater probability, to increase its own expected utility at the expense of player 1. It would want to increase the probability beyond 1/2 which is the break even point for player 1, but then player 1 would rather just play A.
So, what would 2 TDT/UDT players do in this game? Would they manage to find a point on the pareto frontier, and if so, which point?
Let's say player 1 submits a computer program that will receive no input and print either A, B or C. Player 2 submits a computer program that will receive a single bit as input (telling it whether P1's program printed A), and print either X or Y. Both programs also have access to a fair random number generator. That's a simultaneous move game where every Nash equilibrium leads to payoffs (3,0). Hopefully it's not too much of a stretch to say that we should play the game in the same way that the best program would play it.
If additionally each program receives the other's source code as input, many better Nash equilibria become achievable, like the outcome (4,1) proposed by Eliezer. In this case I think it's a bargaining problem. The Nash bargaining solution proposed by Squark might be relevant, though I don't know how to handle such problems in general.
Hopefully it's not too much of a stretch to say that we should play the game in the same way that the best program would play it.
Should we (humans) play like the best program that don't have access to each other's source code play it, or play like the best programs that do have access to each other's source code play it? I mean, figuratively we have some information about the other player's source code...
P1: .5C .5B
P2: Y
It's not a Nash equilibrium, but it could be a timeless one. Possibly more trustworthy than usual for oneshots, since P2 knows that P1 was not a Nash agent assuming the other player was a Nash agent (classical game theorist) if P2 gets to move at all.
I think the following is the unique proper equilibrium of this game:
Player One plays A with probability 1-ϵ, B with probability 1/3 ϵ, C with probability 2/3 ϵ. Player Two plays X with probability 2/3 and Y with probability 1/3.
JGWeissman had essentially the right idea, but used the wrong terminology.
ETA: I've changed my mind and no longer think the proper equilibrium solution makes sense for this game. See later in this thread as well as this comment for the explanation.
Assume that each player's hand may tremble with a small non-zero probability p, then take the limit as p approaches zero from above.
... Let's do that!
Simple model: A plays A, B and C with probabilities a, b, and c, with the constraint that each must be above the trembling probability t (=p/3 using the p above). (Two doesn't tremble for simplicity's sake)
Two picks X with probability x and Y with probability (1-x).
So their expected utilities are:
One: 3a + 2b+6c(1-x)
Two: 2b(1-x) + cx = 2*b + (c - 2b) x
It seems pretty clear that One wants b to be as low as possible (either a or c will always be better), so we can set b=t.
So One's utility is (constant) - 3c+6c -6cx
So One wants c to maximize (1-2x)c, and Two wants x to maximize (c-2t)c
The Nash equilibrium is at 1-2x=0 and c-2t=0, so c=2t and x=0.5
So in other words, if One's hand can tremble than he should also sometimes deliberately pick C to make it twice as likely as B, and Two should flip a coin.
(and as t converges towards 0, we do indeed get One always picking A)
Hmm, is this not the correct solution for two super-rational players:
Player One: Pick C with probability of 2/3 - e; pick B with probability of 1/3 + e, e being some very small but not negligible number. Player Two: Pick Y
Expected payoff for Player One is 4 2/3 -4e; way better than playing A. For B is 2/3 + 2 e, a tiny bit better than playing X - so B will play Y, since he knows that A is totally rational and would have picked this very strategy.
Purely the information that Player One behaves irrationally doesn't give Player Two any more information on A's behaviour than the fact that it is not rational. So other than knowing Player One didn't use the strategy "play A with 100% probability", Player Two doesn't know anything about Player One's behaviour. What can Player Two do on that basis? They can assume that any deviations from rational choice are small, which brings us to the trembling hand solution. Or they can use a different model.
Which model of player One's behaviour is the "...
[The following argument is made with tongue somewhat in cheek.]
Rationality (with a capital R) is supposed to be an idealized algorithm that is universally valid for all agents. Therefore, this algorithm, as such, doesn't know whether it will be instantiated in Player One (P1) or Player Two (P2). Yes, each player knows which one they are. But this knowledge is input that they feed to their Rationality subroutine. The subroutine itself doesn't come with this knowledge built in.
Since Rationality doesn't know where it will end up, it doesn't know which out...
I suspect that UDT players always reach the Nash bargaining solution although I have no proof.
For this game I proved that there is a local maximum of the Nash product when 1 plays B with probability 3/8 and C with probability 5/8 and 2 plays Y. I'm not sure whether it's global (can the Nash product have non-global local maxima?)
"assume both players are rational. What should player 2 do when player 1 acts irrationally?"
Player 2 should realize that his model is incorrect and come up with a new theory for player 1's motivation. If one is human-like, then two should guess that one was tempted by the shot at the 6 payoff and chose C. Play X.
I think you can just compute the Nash Equilibria. For example, use this site: http://banach.lse.ac.uk/
The answer appears to be "always pick A". Player 2 will never get to move.
I thought the answer was Player One picks B with 1/3+ϵ probability and C with 2/3-ϵ probability. Player Two picks Y.
This gives Player One an expected value of 2(1/3+ϵ) + 6(2/3-ϵ) = 14/3-4ϵ and Player Two an expected value of 2(1/3+ϵ) = 2/3+2ϵ.
If Player Two picked X, he'd have an expected value of 2/3-ϵ, and miss out on 3ϵ.
If Player One picked B with a higher probability, Player Two would still pick C, and Player One wouldn't gain anything. If Player One picked C with a higher probability, Player Two would pick X, and Player One would get nothing. If Player One picked A, he'd only get 3, and miss out on 5/3-4ϵ.
Did I mess up somewhere?
So what happens if Player Two does get to move?
This is equivalent to "what if Omega can't predict what I would do?" implicit reasoning done by two-boxers in Newcomb. Neither possibility is in the solution domain, provided that one does not fight the hypothetical (in your case "the players are rational" in Newcomb's "Omega is a perfect predictor"). Player two does not get to move, so there is no point considering that. Omega knows exactly what you'd do, so no point considering what to do if he is wrong.
Hmm. The results appear quite different if you allow communication and repeated plays. And they also introduce something which seems slightly different than Trembling Hand (Perhaps Trembling Memory?)
With communication and repeated plays:
Assume all potential Player 1's credibly precommit to flip a fair coin and based on the toss, pick B half the time, and C half the time.
All potential Player 2's would know this, and assuming they expect Player 1 to almost always follow the precommitment, would pick Y, because they would maximize their expected payout. (50% ...
I'm not overly familiar with game theory, so forgive me if I'm making some elementary mistake, but surely the only possible outcome is Player 1 always picking A. Either other option is essentially Player 1 choosing a smaller or no payoff, which would violate the stated condition of both players being rational. A nonsensical game doesn't have to make sense.
Semi-plausible interpretation of the game:
Player One and Two are in a war. Player One can send a message to his team, and Player Two can intercept it. There is a price for sending the message, and a price for intercepting it.
A is "don't send any message". B is "send a useless (blank, or random) message". (Uses up 1 utility. Also gives the other player 2 free utility points.) C is "send a useful message". (Uses up 1 utility, but gains 4 if not intercepted, and loses extra 1 if intercepted.) X is "intercept", which co...
This can be analyzed as a regular 2-player game with payoff matrix
-- X Y
A 3,0 3,0
B 2,0 2,2
C 0,1 6,0
Player 2's indifference between X and Y when player 1 plays A means that player 2 only considers whether player 1 plays B or C.
What is this game supposed to analogize to in reality? I usually don't like to fight the hypothetical but in this sort of situation I feel like as player 2 I would assume they are including considerations from outside the game's rationality like a sense of fairness or generosity.
This game is exactly equivalent to the standard one where player one chooses from (A,B,C) and player two chooses from (X,Y), with the payoff for (A,X) and for (A,Y) equal to (3,0). When choosing what choice to make, player two can ignore the case where player one chooses A, since the payoffs are the same in that case.
And as others have said, the pure strategy (A,X) is a Nash equilibrium.
There's no mathematical solution for single-player, non-zero sum games of any sort. All these constructs lead to is arguments about "what is rational". If you a full math model of a "rational entity", then you could get a mathematically defined solution.
This is why I prefer evolutionary game theory to classical game theory. Evolutionary game theory generally has models of its actors and thus guarantees a solution to the problems it posits. One can argue with the models and I would say that's where such arguments most fruitfully should be.
The following simple game has one solution that seems correct, but isn’t. Can you figure out why?
The Game
Player One moves first. He must pick A, B, or C. If Player One picks A the game ends and Player Two does nothing. If Player One picks B or C, Player Two will be told that Player One picked B or C, but will not be told which of these two strategies Player One picked, Player Two must then pick X or Y, and then the game ends. The following shows the Players’ payoffs for each possible outcome. Player One’s payoff is listed first.
A 3,0 [And Player Two never got to move.]
B,X 2,0
B,Y 2,2
C,X 0,1
C,Y 6,0
The players are rational, each player cares only about maximizing his own payoff, the players can’t communicate, they play the game only once, this game is all that will ever matter to them, and all of this plus the payoffs and the game structure is common knowledge.
Guess what will happen. Imagine you are really playing the game and decide what you would do as either Player One, or as Player Two if you have been told that you will get to move. To figure out what you would do you must formulate a belief about what the other player has/will do, and this will in part be based on your belief about his belief of what you have/will do.
An Incorrect Argument for A
If Player One picks A he gets 3, whereas if he picks B he gets 2 regardless of what Player Two does. Consequently, Player One should never pick B. If Player One picks C he might get 0 or 6 so we can’t rule out Player One picking C, at least without first figuring out what Player Two will do.
Player Two should assume that Player One will never pick B. Consequently, if Player Two gets to move he should assume that C was played and therefore Player Two should respond with X. If Player One believes that Player Two will, if given the chance to move, pick X, then Player One is best off picking A. In conclusion, Player One will pick A and Player Two will never get to move.
Why the Game Has No Solution
I believe that the above logic is wrong, and indeed the game has no solution. My reasoning is given in rot13. (Copy what is below and paste at this link to convert to English.)
http://rot13.com/
Vs gur nobir nanylfvf jrer pbeerpg Cynlre Gjb jbhyq oryvrir ur jvyy arire zbir. Fb jung unccraf vs Cynlre Gjb qbrf trg gb zbir? Vs Cynlre Gjb trgf gb zbir jung fubhyq uvf oryvrs or nobhg jung Cynlre Bar qvq tvira gung Cynlre Gjb xabjf Cynlre Bar qvq abg cvpx N? Cynlre Gjb pna’g nffhzr gung P jnf cynlrq. Vs vg jrer gehr gung vg’f pbzzba xabjyrqtr gung Cynlre Bar jbhyq arire cynl O, gura vg fubhyq or pbzzba xabjyrqtr gung Cynlre Gjb jbhyq arire cynl L, juvpu jbhyq zrna gung Cynlre Bar jbhyq arire cynl P, ohg pyrneyl Cynlre Bar unf cvpxrq O be P fb fbzrguvat vf jebat.
Zber nofgenpgyl, vs V qrirybc n gurbel gung lbh jba’g gnxr npgvba Y, naq guvf arprffnevyl erfhygf va gur vzcyvpngvba gung lbh jba’g qb npgvba Z, gura vs lbh unir pyrneyl qbar rvgure Y be Z zl bevtvany gurbel vf vainyvq. V’z abg nyybjrq gb nffhzr gung lbh zhfg unir qbar Z whfg orpnhfr zl vavgvny cebbs ubyqvat gung lbh jba’g qb Y gbbx srjre fgrcf guna zl cebbs sbe jul lbh jba’g qb Z qvq.
Abar vs guvf jbhyq or n ceboyrz vs vg jrer veengvbany sbe Cynlre Bar gb abg cvpx N. Nsgre nyy, V unir nffhzrq engvbanyvgl fb V’z abg nyybjrq gb cbfghyngr gung Cynlre Bar jvyy qb fbzrguvat veengvbany. Ohg vg’f veengvbany sbe Cynlre Bar gb Cvpx P bayl vs ur rfgvzngrf gung gur cebonovyvgl bs Cynlre Gjb erfcbaqvat jvgu L vf fhssvpvragyl ybj. Cynlre Gjb’f zbir jvyy qrcraq ba uvf oryvrsf bs jung Cynlre Bar unf qbar vs Cynlre Bar unf abg cvpxrq N. Pbafrdhragyl, jr pna bayl fnl vg vf veengvbany sbe Cynlre Bar gb abg cvpx N nsgre jr unir svtherq bhg jung oryvrs Cynlre Gjb jbhyq unir vs Cynlre Gjb trgf gb cynl. Naq guvf oryvrs bs Cynlre Gjb pna’g or onfrq ba gur nffhzcgvba gung Cynlre Bar jvyy arire cvpx O orpnhfr guvf erfhygf va Cynlre Gjb oryvrivat gung Cynlre Bar jvyy arire cvpx P rvgure, ohg pyrneyl vs Cynlre Gjb trgf gb zbir rvgure O be P unf orra cvpxrq.
Va fhz, gb svaq n fbyhgvba sbe gur tnzr jr arrq gb xabj jung Cynlre Gjb jbhyq qb vs ur trgf gb zbir, ohg gur bayl ernfbanoyr pnaqvqngr fbyhgvba unf Cynlre Gjb arire zbivat fb jr unir n pbagenqvpgvba naq V unir ab vqrn jung gur evtug nafjre vf. Guvf vf n trareny ceboyrz va tnzr gurbel jurer n fbyhgvba erdhverf svthevat bhg jung n cynlre jbhyq qb vs ur trgf gb zbir, ohg nyy gur ernfbanoyr fbyhgvbaf unir guvf cynlre arire zbivat.
Update: Emile has a great answer if you assume a "trembling hand."