Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

itaibn0 comments on Robust Cooperation in the Prisoner's Dilemma - Less Wrong

69 Post author: orthonormal 07 June 2013 08:30AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (145)

You are viewing a single comment's thread.

Comment author: itaibn0 09 June 2013 02:53:12PM 9 points [-]

I found situation where PrudentBot behaves suboptimally and a more intelligent agent can do better. I'm considering an opponent I call pseudo tit-for-tat at level N, or PsTitTatBot[N]. PsTitTatBot[0] is simply CooperateBot. For N>0, the strategy for PsTitTatBot[N] is to simulate its opponent running against PsTitTatBot[N-1], and do the same thing its opponent did in the simulation. PrudentBot would have an outcome of (D,D) against PsTitTatBot[N] for any N. However, it's easy for an agent to ensure that against sufficiently high level tit-for-tat agent the outcome will be (C,C). To do this, there must be a specific N such that this agent plays suboptimally against PsTitTatBot[N]. However, the benefit of mutual cooperation at higher levels should compensate for that.

Comment author: orthonormal 09 June 2013 03:18:16PM 3 points [-]

Indeed! This is a nice example of the "agents punishing you for playing correctly against other agents" obstacle to optimality.

But unless I expected PsTitTatBot[N] for N>0 to be more common than CooperateBots (which I think of as naturally occurring in reality), I'm OK with defecting against all of them.

Comment author: itaibn0 09 June 2013 05:11:14PM 2 points [-]

I agree that it is best to defect against CooperateBot. However, the case for PsTitTatBot[630] is far from clear; when your opponent is that you should seriously consider the possibility that you are being simulated. Thinking about this some more, there are other strategies, for example, cooperating for 5<N<1000 but defecting for N=1000 based on the belief that the environment is likely to contain PsTitTatBot[1000] (although if you think PsTitTatBot[1000] is unusually likely, all you really need to do is cooperate against PsTitTatBot[999]). So I'm not certain at all what the best response is to a pseudo tit-for-tat agent. However, this situation provides a testing ground for agents that could be stronger than PrudentBot.

Comment author: CCC 09 June 2013 05:58:23PM *  0 points [-]

One possibility is to recognise such self-simulating bots (possibly by recognising that they simulate their opponent against a slightly modified version of themselves), finding their opponent's N-value, and cooperating if N is odd, acting as Prudentbot in all other cases.

Such a bot will likely defect against 50% of PsTitTatBot[N] bots when the PsTitTatBot[N] cooperates (i.e. where N is even). It will defect against CooperateBot. Unfortunately, against the other 50% of PsTitTatBots (i.e. where N is odd) it will cooperate while the PsTitTatBot defects.

Whether this does better than PrudentBot or not depends on the exact payoff matrix; is the cost of cooperating against a defection half the time worth the benefit of defecting against a cooperation the other half the time, as compared to mutual defection all the time?

Another possible strategy is simply to defect against CooperateBot, but cooperate with all PsTitTatBot[N] where N>0. This will lose against PsTitTatBot[1] only, and achieve mutual cooperation with all higher levels of PsTitTatBot. Again, whether this is better than the above or not depends on the exact payoff matrix. (This one is guaranteed to do better than PrudentBot, assuming few examples of PsTitTatBot[1] and many examples of PsTitTatBot[N] for N>1)

Comment author: Will_Sawin 11 June 2013 02:06:08AM 4 points [-]

To recognize what?

What you need to recognize is bots that are simulated by other bots. Consider the pair of BabyBear, which does whatever, and MamaBear, which cooperates with you if and only if you cooperate with BabyBear. Estimating the ratio of MamaBears to any particular BabyBear is an empirical question.

What seems problematic about these strategies is that they are not evolutionarily stable. In a world filled with those bots, PsTitTat bots would proliferate and drive them out. That hardly seems optimal!

Comment author: Will_Sawin 11 June 2013 02:12:36AM 1 point [-]

One simple way to study this is the game where one player chooses PsTitTatBot[N] for some N, and another player chooses any bot. Since this game has an infinite number of strategies, Nash equilibria are poorly behaved. Still, complete mutual cooperation occurs in some Nash equilibria. I have not completely studied which other possibilities can occur.