Indeed! This is a nice example of the "agents punishing you for playing correctly against other agents" obstacle to optimality.

But unless I expected PsTitTatBot[N] for N>0 to be more common than CooperateBots (which I think of as naturally occurring in reality), I'm OK with defecting against all of them.

Comment author:itaibn0
09 June 2013 05:11:14PM
2 points
[-]

I agree that it is best to defect against CooperateBot. However, the case for PsTitTatBot[630] is far from clear; when your opponent is that you should seriously consider the possibility that you are being simulated. Thinking about this some more, there are other strategies, for example, cooperating for 5<N<1000 but defecting for N=1000 based on the belief that the environment is likely to contain PsTitTatBot[1000] (although if you think PsTitTatBot[1000] is unusually likely, all you really need to do is cooperate against PsTitTatBot[999]). So I'm not certain at all what the best response is to a pseudo tit-for-tat agent. However, this situation provides a testing ground for agents that could be stronger than PrudentBot.

Comment author:CCC
09 June 2013 05:58:23PM
*
0 points
[-]

One possibility is to recognise such self-simulating bots (possibly by recognising that they simulate their opponent against a slightly modified version of themselves), finding their opponent's N-value, and cooperating if N is odd, acting as Prudentbot in all other cases.

Such a bot will likely defect against 50% of PsTitTatBot[N] bots when the PsTitTatBot[N] cooperates (i.e. where N is even). It will defect against CooperateBot. Unfortunately, against the other 50% of PsTitTatBots (i.e. where N is odd) it will cooperate while the PsTitTatBot defects.

Whether this does better than PrudentBot or not depends on the exact payoff matrix; is the cost of cooperating against a defection half the time worth the benefit of defecting against a cooperation the other half the time, as compared to mutual defection all the time?

Another possible strategy is simply to defect against CooperateBot, but cooperate with all PsTitTatBot[N] where N>0. This will lose against PsTitTatBot[1] only, and achieve mutual cooperation with all higher levels of PsTitTatBot. Again, whether this is better than the above or not depends on the exact payoff matrix. (This one is guaranteed to do better than PrudentBot, assuming few examples of PsTitTatBot[1] and many examples of PsTitTatBot[N] for N>1)

Comment author:Will_Sawin
11 June 2013 02:06:08AM
4 points
[-]

To recognize what?

What you need to recognize is bots that are simulated by other bots. Consider the pair of BabyBear, which does whatever, and MamaBear, which cooperates with you if and only if you cooperate with BabyBear. Estimating the ratio of MamaBears to any particular BabyBear is an empirical question.

What seems problematic about these strategies is that they are not evolutionarily stable. In a world filled with those bots, PsTitTat bots would proliferate and drive them out. That hardly seems optimal!

## Comments (145)

BestIndeed! This is a nice example of the "agents punishing you for playing correctly against other agents" obstacle to optimality.

But unless I expected PsTitTatBot[N] for N>0 to be more common than CooperateBots (which I think of as naturally occurring in reality), I'm OK with defecting against all of them.

I agree that it is best to defect against CooperateBot. However, the case for PsTitTatBot[630] is far from clear; when your opponent is that you should seriously consider the possibility that you are being simulated. Thinking about this some more, there are other strategies, for example, cooperating for 5<N<1000 but defecting for N=1000 based on the belief that the environment is likely to contain PsTitTatBot[1000] (although if you think PsTitTatBot[1000] is unusually likely, all you really need to do is cooperate against PsTitTatBot[999]). So I'm not certain at all what the best response is to a pseudo tit-for-tat agent. However, this situation provides a testing ground for agents that could be stronger than PrudentBot.

*0 points [-]One possibility is to recognise such self-simulating bots (possibly by recognising that they simulate their opponent against a slightly modified version of themselves), finding their opponent's N-value, and cooperating if N is odd, acting as Prudentbot in all other cases.

Such a bot will likely defect against 50% of PsTitTatBot[N] bots when the PsTitTatBot[N] cooperates (i.e. where N is even). It will defect against CooperateBot. Unfortunately, against the other 50% of PsTitTatBots (i.e. where N is odd) it will cooperate while the PsTitTatBot defects.

Whether this does better than PrudentBot or not depends on the exact payoff matrix; is the cost of cooperating against a defection half the time worth the benefit of defecting against a cooperation the other half the time, as compared to mutual defection all the time?

Another possible strategy is simply to defect against CooperateBot, but cooperate with all PsTitTatBot[N] where N>0. This will lose against PsTitTatBot[1]

only, and achieve mutual cooperation with all higher levels of PsTitTatBot. Again, whether this is better than the above or not depends on the exact payoff matrix. (This one is guaranteed to do better than PrudentBot, assuming few examples of PsTitTatBot[1] and many examples of PsTitTatBot[N] for N>1)To recognize what?

What you need to recognize is bots that are simulated by other bots. Consider the pair of BabyBear, which does whatever, and MamaBear, which cooperates with you if and only if you cooperate with BabyBear. Estimating the ratio of MamaBears to any particular BabyBear is an empirical question.

What seems problematic about these strategies is that they are not evolutionarily stable. In a world filled with those bots, PsTitTat bots would proliferate and drive them out. That hardly seems optimal!