Some people on LW have expressed interest in what's happening on the decision-theory-workshop mailing list. Here's an example of the kind of work we're trying to do there.
In April 2010 Gary Drescher proposed the "Agent simulates predictor" problem, or ASP, that shows how agents with lots of computational power sometimes fare worse than agents with limited resources. I'm posting it here with his permission:
There's a version of Newcomb's Problem that poses the same sort of challenge to UDT that comes up in some multi-agent/game-theoretic scenarios.
Suppose:
- The predictor does not run a detailed simulation of the agent, but relies instead on a high-level understanding of the agent's decision theory and computational power.
- The agent runs UDT, and has the ability to fully simulate the predictor.
Since the agent can deduce (by low-level simulation) what the predictor will do, the agent does not regard the prediction outcome as contingent on the agent's computation. Instead, either predict-onebox or predict-twobox has a probability of 1 (since one or the other of those is deducible), and a probability of 1 remains the same regardless of what we condition on. The agent will then calculate greater utility for two-boxing than for one-boxing.
Meanwhile, the predictor, knowing that the the agent runs UDT and will fully simulate the predictor, can reason as in the preceding paragraph, and thus deduce that the agent will two-box. So the large box is left empty and the agent two-boxes (and the agent's detailed simulation of the predictor correctly shows the predictor correctly predicting two-boxing).
The agent would be better off, though, running a different decision theory that does not two-box here, and that the predictor can deduce does not two-box.
About a month ago I came up with a way to formalize the problem, along the lines of my other formalizations:
a) The agent generates all proofs of length up to M, then picks the action for which the greatest utility was proven.
b) The predictor generates all proofs of length up to N which is much less than M. If it finds a provable prediction about the agent's action, it fills the boxes accordingly. Also the predictor has an "epistemic advantage" over the agent: its proof system has an axiom saying the agent's proof system is consistent.
Now the predictor can reason as follows. It knows that the agent will find some proof that the predictor will put X dollars in the second box, for some unknown value of X, because the agent has enough time to simulate the predictor. Therefore, it knows that the agent will find proofs that one-boxing leads to X dollars and two-boxing leads to X+1000 dollars. Now what if the agent still chooses one-boxing in the end? That means it must have found a different proof saying one-boxing gives more than X+1000 dollars. But if the agent actually one-boxes, the existence of these two different proofs would imply that the agent's proof system is inconsistent, which the predictor knows to be impossible. So the predictor ends up predicting that the agent will two-box, the agent two-boxes, and everybody loses.
Also Wei Dai has a tentative new decision theory that solves the problem, but this margin (and my brain) is too small to contain it :-)
Can LW generate the kind of insights needed to make progress on problems like ASP? Or should we keep working as a small clique?
You don't get to do that, because when you're writing World, the Strategy hasn't been determined yet. Think of it as a challenge-response protocol; World is a challenge, and Strategy is a response. You can still do agent-copying, but you have to enlarge the scope of World to include the rules by which that copying was done, or else you get unrelated agents instead of copies.
To copy agent's code, you don't need to know strategy. World naturally changes if you change it, and the strategy might change as well if you run the agent on a changed world, but agent's code is still the same, and you know this code. The new world will only depend on the new strategy, not the old one, but now we have a world that depends on its agent's strategy, and you won't be able to find how it does, if you don't already know.
In any case, all this copying is irrelevant, because the point is that there can exist very convoluted worlds that depend on agent's action, but it's not feasible to know that they do or how they do. And we don't get to choose the real world.