Agent-Simulates-Predictor Variant of the Prisoner's Dilemma

Gram_Stone

I don't know enough math and I don't know if this is important, but in the hopes that it helps someone figure something out that they otherwise might not, I'm posting it.

In Soares & Fallenstein (2015), the authors describe the following problem:

Consider a simple two-player game, described by Slepnev (2011), played by a human and an agent which is capable of fully simulating the human and which acts according to the prescriptions of UDT. The game works as follows: each player must write down an integer between 0 and 10. If both numbers sum to 10 or less, then each player is paid according to the number that they wrote down. Otherwise, they are paid nothing. For example, if one player writes down 4 and the other 3, then the former gets paid $4 while the latter gets paid $3. But if both players write down 6, then neither player gets paid. Say the human player reasons as follows:

"I don’t quite know how UDT works, but I remember hearing that it’s a very powerful predictor. So if I decide to write down 9, then it will predict this, and it will decide to write 1. Therefore, I can write down 9 without fear."

The human writes down 9, and UDT, predicting this, prescribes writing down 1. This result is uncomfortable, in that the agent with superior predictive power “loses” to the “dumber” agent. In this scenario, it is almost as if the human’s lack of ability to predict UDT (while using correct abstract reasoning about the UDT algorithm) gives the human an “epistemic high ground” or “first mover advantage.” It seems unsatisfactory that increased predictive power can harm an agent.

More precisely: two agents A and B must choose integers m and n with 0 ≤ m, n ≤ 10, and if m + n ≤ 10, then A receives a payoff of m dollars and B receives a payoff of n dollars, and if m + n > 10, then each agent receives a payoff of zero dollars. B has perfect predictive accuracy and A knows that B has perfect predictive accuracy.

Consider a variant of the aforementioned decision problem in which the same two agents A and B must choose integers m and n with 0 ≤ m, n ≤ 3; if m + n ≤ 3, then {A, B} receives a payoff of {m, n} dollars; if m + n > 3, then {A, B} receives a payoff of zero dollars. This variant is similar to a variant of the Prisoner's Dilemma with a slightly modified payoff matrix:

Likewise, A reasons as follows:

If I cooperate, then B will predict that I will cooperate, and B will defect. If I defect, then B will predict that I will defect, and B will cooperate. Therefore, I defect.

And B:

I predict that A will defect. Therefore, I cooperate.

I figure it's good to have multiple takes on a problem if possible, and that this particular take might be especially valuable, what with all of the attention that seems to get put on the Prisoner's Dilemma and its variants.

I don't know enough math and I don't know if this is important, but in the hopes that it helps someone figure something out that they otherwise might not, I'm posting it.

In Soares & Fallenstein (2015), the authors describe the following problem:

Consider a simple two-player game, described by Slepnev (2011), played by a human and an agent which is capable of fully simulating the human and which acts according to the prescriptions of UDT. The game works as follows: each player must write down an integer between 0 and 10. If both numbers sum to 10 or less, then each player is paid according to the number that they wrote down. Otherwise, they are paid nothing. For example, if one player writes down 4 and the other 3, then the former gets paid $4 while the latter gets paid $3. But if both players write down 6, then neither player gets paid. Say the human player reasons as follows:

"I don’t quite know how UDT works, but I remember hearing that it’s a very powerful predictor. So if I decide to write down 9, then it will predict this, and it will decide to write 1. Therefore, I can write down 9 without fear."

The human writes down 9, and UDT, predicting this, prescribes writing down 1. This result is uncomfortable, in that the agent with superior predictive power “loses” to the “dumber” agent. In this scenario, it is almost as if the human’s lack of ability to predict UDT (while using correct abstract reasoning about the UDT algorithm) gives the human an “epistemic high ground” or “first mover advantage.” It seems unsatisfactory that increased predictive power can harm an agent.

Likewise, A reasons as follows:

If I cooperate, then B will predict that I will cooperate, and B will defect. If I defect, then B will predict that I will defect, and B will cooperate. Therefore, I defect.

And B:

I predict that A will defect. Therefore, I cooperate.

What some comments are missing is that not only UDT is a perfect predictor, but apparently the human is, too. The human fully and correctly predicts the UDT response.

A doesn't have perfect predictive accuracy. A merely knows that B has perfect predictive accuracy. If A is pitted against a different agent with sufficiently small predictive accuracy, then A cannot predict that agent's actions well enough to cause outcomes like the one in this problem.

Consider a variant in which A is replaced by DefectBot. It seems rational for UDT to cooperate. The parameters of the decision problem are not conditional on UDT's own decision algorithm (all agents in this scenario must choose between payoffs of $0 and $1), and cooperating maximizes expected utility. But what we have just described is a very different game. DefectBot always defects, but AFAICT, it can be shown that A behaves precisely as an agent would if it cooperates with an arbitrary agent C if P(C predicts A defects | A defects) is less than 0.5, is indifferent if that probability equals 0.5, and defects if that probability is greater than 0.5.

Suppose that C's predictive accuracy is greater than 50 percent. Then the expected utility of A defecting is 2p + 0(1 - p), and the expected utility of A cooperating is 1p + 1(1 - p), and the expected utility of defection is greater than that of cooperation. Plug in numbers if you need to. There are similar proofs that If B's predictions are random, then A is indifferent, and if B's predictive ability is less than 50 percent, then A cooperates.

If we played an iterated variant of this game, then the expected value of the sequence of payoffs to A will almost surely exceed the expected value of the sequence of payoffs to B. The important thing is that in our game, UDT seems to be penalized for its predictive accuracy when it plays against agents like A despite dominating other decision theories that 'win' on this problem in other problem classes.

When described in this way, I am reminded that I would be very interested to see this sort of problem examined in the modal agents framework. I have to flag that I lack a technical understanding of this sort of thing, but it seems like we can imagine the agents as formal systems, with B stronger than A, and A forcing B to prove that A defects by making it provable in A that A defects, and since B is stronger than A, it is also provable in B that A defects.

Take away the ability to predict from the human (e.g. by adding randomness to UDT decisions) and see if it's still optimal to put 9 down.

I'm not sure what precisely you mean by "add randomness", but if you mean "give UDT less than perfect predictive accuracy," then as I have shown above, and as in Newcomb's problem, there are variants of this game in which UDT has predictive accuracy greater than 50% but less than 100% and in which the same outcome obtains. Any other interpretation of "add randomness" that I can think of simply results in an agent that we call a UDT agent but that is not one.

You know, let's back up. I'm confused.

What is the framework, the context in which we are examining these problems? What is the actual question we're trying to answer?

0Lumifer11y

In your setup it does. It is making accurate predictions, doesn't it? Always?