Some people on LW have expressed interest in what's happening on the decision-theory-workshop mailing list. Here's an example of the kind of work we're trying to do there.
In April 2010 Gary Drescher proposed the "Agent simulates predictor" problem, or ASP, that shows how agents with lots of computational power sometimes fare worse than agents with limited resources. I'm posting it here with his permission:
There's a version of Newcomb's Problem that poses the same sort of challenge to UDT that comes up in some multi-agent/game-theoretic scenarios.
Suppose:
- The predictor does not run a detailed simulation of the agent, but relies instead on a high-level understanding of the agent's decision theory and computational power.
- The agent runs UDT, and has the ability to fully simulate the predictor.
Since the agent can deduce (by low-level simulation) what the predictor will do, the agent does not regard the prediction outcome as contingent on the agent's computation. Instead, either predict-onebox or predict-twobox has a probability of 1 (since one or the other of those is deducible), and a probability of 1 remains the same regardless of what we condition on. The agent will then calculate greater utility for two-boxing than for one-boxing.
Meanwhile, the predictor, knowing that the the agent runs UDT and will fully simulate the predictor, can reason as in the preceding paragraph, and thus deduce that the agent will two-box. So the large box is left empty and the agent two-boxes (and the agent's detailed simulation of the predictor correctly shows the predictor correctly predicting two-boxing).
The agent would be better off, though, running a different decision theory that does not two-box here, and that the predictor can deduce does not two-box.
About a month ago I came up with a way to formalize the problem, along the lines of my other formalizations:
a) The agent generates all proofs of length up to M, then picks the action for which the greatest utility was proven.
b) The predictor generates all proofs of length up to N which is much less than M. If it finds a provable prediction about the agent's action, it fills the boxes accordingly. Also the predictor has an "epistemic advantage" over the agent: its proof system has an axiom saying the agent's proof system is consistent.
Now the predictor can reason as follows. It knows that the agent will find some proof that the predictor will put X dollars in the second box, for some unknown value of X, because the agent has enough time to simulate the predictor. Therefore, it knows that the agent will find proofs that one-boxing leads to X dollars and two-boxing leads to X+1000 dollars. Now what if the agent still chooses one-boxing in the end? That means it must have found a different proof saying one-boxing gives more than X+1000 dollars. But if the agent actually one-boxes, the existence of these two different proofs would imply that the agent's proof system is inconsistent, which the predictor knows to be impossible. So the predictor ends up predicting that the agent will two-box, the agent two-boxes, and everybody loses.
Also Wei Dai has a tentative new decision theory that solves the problem, but this margin (and my brain) is too small to contain it :-)
Can LW generate the kind of insights needed to make progress on problems like ASP? Or should we keep working as a small clique?
What's World? What do you mean by "World: Strategy->Outcome"? The problem is that if World is a function, it's given by some syntactic specification, and the agent, being part of the world, can control which function this syntactic specification refers to, or which value this function has for a particular argument, in a way unrelated to its type, to passing of this Strategy thing as its argument. This is a typical example of explicit dependence bias: you declare that World (as a mathematical structure) depends on nothing, but it really does.
See example "world3" in my post for an example with Newcomb's problem. There, World is given by a program "world3" that takes agent's action as parameter, but also depends on agent's action implicitly in a way that isn't captured by the interface of the function. The function in fact shows that two-boxing dominates one-boxing, and one can only see that it's really worse by recognizing that function's value when given agent's actual action as parameter, is controlled by agent's action in a way that makes one-boxing preferable. And so only seeing program "world3" as specifying a function is unhelpful.
It means "World is the type of function that takes a Strategy as input and returns an Outcomes."
In the linked article, you've defined world, world2, and world3 as things that're not actually functions; they have unbound references to agent, which are parameters in disguise. You then show that if you mix parameters-as-unbound-references with real parameters, you can get confused into thinking they're independent. Which jus means you shouldn't use unbound references.