(This post doesn't require much math. It's very speculative and probably confused.)
Wei Dai came up with a problem that seems equivalent to a variant of Counterfactual Mugging with some added twists:
- the coinflip is "logical", e.g. the parity of the millionth digit of pi;
- after you receive the offer, you will have enough resources to calculate the coinflip's outcome yourself;
- but you need to figure out the correct decision algorithm ahead of time, when you don't have these resources and are still uncertain about the coinflip's outcome.
If you give 50/50 chances now to the millionth digit of pi being even or odd, you probably want to write the decision algorithm so it agrees to pay up later even when faced with a proof that the millionth digit of pi is even. But from the decision algorithm's point of view, the situation looks more like being asked to pay up because 2+2=4. How do we resolve this tension?
One of the main selling points of TDT-style decision theories is eliminating the need for precommitment. You're supposed to always do what you would have precommitted to doing, even if it doesn't seem like a very good idea after you've done your Bayesian updates. UDT solves Counterfactual Mugging and similar problems by being "updateless", so you keep caring about possible worlds in accordance with their apriori probabilities regardless of which world you end up in.
If we take the above problem at face value, it seems to tell us that UDT should treat logical uncertainty updatelessly too, and keep caring about logically impossible worlds in accordance with their apriori logical probabilities. It seems to hint that UDT should be coded from the start with a "logical prior" over mathematical statements, which encodes the creator's arbitrary "logical degrees of caring", just like its regular prior encodes the creator's arbitrary degrees of caring over physics. Then the AI must keep following that prior forever after. But that's a very tall order. Should you really keep caring about logically impossible worlds where 2+2=5, and accept bargains that help copies of you in such worlds, even after you calculate that 2+2=4?
That conclusion is pretty startling, but consider what happens if you reject it:
- Precommitment can be modeled as a decision problem where an AI is asked to write a successor AI.
- Imagine the AI is asked to write a program P that will be faced with Counterfactual Mugging with a logical coin. The AI doesn't have enough resources to calculate the coin's outcome, but P will have as much computing power as needed. The resulting utility goes to the AI.
- Writing P is equivalent to supplying one bit: should P pay up if asked?
- Supplying that bit is equivalent to accepting or refusing the bet "win $10000 if the millionth digit of pi is odd, lose $100 if it's even".
So if your AI treats logical uncertainty similarly enough to probabilities that it can make bets on digits of pi, reflective consistency seems to force it to have an unchanging "logical prior", and keep paying up in Counterfactual Mugging even when the logical coinflip looks as obvious to the AI as 2+2=4. Is there any way to escape this conclusion? (Nesov has an idea, but I can't parse it yet.) And what could a formalization of "logical priors" possibly look like?
"Normal" priors are about comparative value of worlds, with observations only resolving indexical uncertainty about your location among these worlds. In UDT, there is typically an assumption that an agent has excessive computational resources, and so the only purpose of observations is in resolving this indexical uncertainty. A UDT agent is working with a fixed collection of possible worlds, and it doesn't learn anything about these worlds from observation. It devises a general strategy that is evaluated by looking how it fares at all locations that use it, across the fixed collection of possible worlds.
In contrast, logical uncertainty is not about location within the collection of possible worlds, it's about the state of those worlds, or even about presence of specific worlds in the collection. The value of any given strategy that responds to observations would then depend on the state of logical uncertainty, and so evaluating a strategy is not as simple as taking the current epistemic state's point of view.
A new possibility opens: some observations can communicate not just indexical information, but also logical information (alternatively, information about the state of the collection of possible worlds, not just location in the worlds of the collection). This possibility calls for something analogous to anthropic reasoning: the fact that an agent observes something tells it something about the big world, not just about which small world it's located in. Another analogy is value uncertainty: resolving logical uncertainty essentially resolves uncertainty about agent's utility definition (and this is another way of generating thought experiments about this issue).
So when an agent is on a branch of a strategy that indicates something new about the collection of possible worlds, the agent would evaluate the whole strategy differently from when it started out. But when it started out, it could also predict how the expected value of the strategy would look given that hypothetical observation, and also given the alternative hypothetical observations. How does it balance these possible points of view? I don't know, but this is a new problem that breaks UDT's assumptions, and at least to this puzzle the answer seems to be "don't pay up".
Our set of possible worlds comes from somewhere, some sort of criteria. Whatever generates that list passes it to our choice algorithm, which begins branching. Lets say we receive an observation that contains both Logical and Indexical updates- could we not just take our current set of possible worlds, with our current set of data on them, update the list against our logical update, and pass that list on to a new copy of the function? The collection remains fixed as far as each copy of the function is concerned, but retains the ability to update on new information. When finished, the path returned will be the most likely given all new observations.