Example decision theory problem: "Agent simulates predictor"

cousin_it

Some people on LW have expressed interest in what's happening on the decision-theory-workshop mailing list. Here's an example of the kind of work we're trying to do there.

In April 2010 Gary Drescher proposed the "Agent simulates predictor" problem, or ASP, that shows how agents with lots of computational power sometimes fare worse than agents with limited resources. I'm posting it here with his permission:

There's a version of Newcomb's Problem that poses the same sort of challenge to UDT that comes up in some multi-agent/game-theoretic scenarios.

Suppose:

The predictor does not run a detailed simulation of the agent, but relies instead on a high-level understanding of the agent's decision theory and computational power.

The agent runs UDT, and has the ability to fully simulate the predictor.

Since the agent can deduce (by low-level simulation) what the predictor will do, the agent does not regard the prediction outcome as contingent on the agent's computation. Instead, either predict-onebox or predict-twobox has a probability of 1 (since one or the other of those is deducible), and a probability of 1 remains the same regardless of what we condition on. The agent will then calculate greater utility for two-boxing than for one-boxing.

Meanwhile, the predictor, knowing that the the agent runs UDT and will fully simulate the predictor, can reason as in the preceding paragraph, and thus deduce that the agent will two-box. So the large box is left empty and the agent two-boxes (and the agent's detailed simulation of the predictor correctly shows the predictor correctly predicting two-boxing).

The agent would be better off, though, running a different decision theory that does not two-box here, and that the predictor can deduce does not two-box.

About a month ago I came up with a way to formalize the problem, along the lines of my other formalizations:

a) The agent generates all proofs of length up to M, then picks the action for which the greatest utility was proven.

b) The predictor generates all proofs of length up to N which is much less than M. If it finds a provable prediction about the agent's action, it fills the boxes accordingly. Also the predictor has an "epistemic advantage" over the agent: its proof system has an axiom saying the agent's proof system is consistent.

Now the predictor can reason as follows. It knows that the agent will find some proof that the predictor will put X dollars in the second box, for some unknown value of X, because the agent has enough time to simulate the predictor. Therefore, it knows that the agent will find proofs that one-boxing leads to X dollars and two-boxing leads to X+1000 dollars. Now what if the agent still chooses one-boxing in the end? That means it must have found a different proof saying one-boxing gives more than X+1000 dollars. But if the agent actually one-boxes, the existence of these two different proofs would imply that the agent's proof system is inconsistent, which the predictor knows to be impossible. So the predictor ends up predicting that the agent will two-box, the agent two-boxes, and everybody loses.

Also Wei Dai has a tentative new decision theory that solves the problem, but this margin (and my brain) is too small to contain it :-)

Can LW generate the kind of insights needed to make progress on problems like ASP? Or should we keep working as a small clique?

Some people on LW have expressed interest in what's happening on the decision-theory-workshop mailing list. Here's an example of the kind of work we're trying to do there.

There's a version of Newcomb's Problem that poses the same sort of challenge to UDT that comes up in some multi-agent/game-theoretic scenarios.

Suppose:

The predictor does not run a detailed simulation of the agent, but relies instead on a high-level understanding of the agent's decision theory and computational power.

The agent runs UDT, and has the ability to fully simulate the predictor.

Since the agent can deduce (by low-level simulation) what the predictor will do, the agent does not regard the prediction outcome as contingent on the agent's computation. Instead, either predict-onebox or predict-twobox has a probability of 1 (since one or the other of those is deducible), and a probability of 1 remains the same regardless of what we condition on. The agent will then calculate greater utility for two-boxing than for one-boxing.

Meanwhile, the predictor, knowing that the the agent runs UDT and will fully simulate the predictor, can reason as in the preceding paragraph, and thus deduce that the agent will two-box. So the large box is left empty and the agent two-boxes (and the agent's detailed simulation of the predictor correctly shows the predictor correctly predicting two-boxing).

The agent would be better off, though, running a different decision theory that does not two-box here, and that the predictor can deduce does not two-box.

About a month ago I came up with a way to formalize the problem, along the lines of my other formalizations:

a) The agent generates all proofs of length up to M, then picks the action for which the greatest utility was proven.

b) The predictor generates all proofs of length up to N which is much less than M. If it finds a provable prediction about the agent's action, it fills the boxes accordingly. Also the predictor has an "epistemic advantage" over the agent: its proof system has an axiom saying the agent's proof system is consistent.

Now the predictor can reason as follows. It knows that the agent will find some proof that the predictor will put X dollars in the second box, for some unknown value of X, because the agent has enough time to simulate the predictor. Therefore, it knows that the agent will find proofs that one-boxing leads to X dollars and two-boxing leads to X+1000 dollars. Now what if the agent still chooses one-boxing in the end? That means it must have found a different proof saying one-boxing gives more than X+1000 dollars. But if the agent actually one-boxes, the existence of these two different proofs would imply that the agent's proof system is inconsistent, which the predictor knows to be impossible. So the predictor ends up predicting that the agent will two-box, the agent two-boxes, and everybody loses.

Also Wei Dai has a tentative new decision theory that solves the problem, but this margin (and my brain) is too small to contain it :-)

Can LW generate the kind of insights needed to make progress on problems like ASP? Or should we keep working as a small clique?

Yeah, you can't control arbitrary world programs that way, but remember that our world really is like that: a human mind runs within the same universe that it's trying to control. One idea would be to define a class of "fair" world programs (perhaps those that can be factorized in appropriate ways) and care only about solving those. But I guess I'd better wait for your writeup, because I don't understand what kind of formalism you prefer.

47

Example decision theory problem: "Agent simulates predictor"

47

47

47

Example decision theory problem: "Agent simulates predictor"

47

47