Suppose you wake up as a paperclip maximizer. Omega says "I calculated the millionth digit of pi, and it's odd. If it had been even, I would have made the universe capable of producing either 1020 paperclips or 1010 staples, and given control of it to a staples maximizer. But since it was odd, I made the universe capable of producing 1010 paperclips or 1020 staples, and gave you control." You double check Omega's pi computation and your internal calculator gives the same answer.
Then a staples maximizer comes to you and says, "You should give me control of the universe, because before you knew the millionth digit of pi, you would have wanted to pre-commit to a deal where each of us would give the other control of the universe, since that gives you 1/2 probability of 1020 paperclips instead of 1/2 probability of 1010 paperclips."
Is the staples maximizer right? If so, the general principle seems to be that we should act as if we had precommited to a deal we would have made in ignorance of logical facts we actually possess. But how far are we supposed to push this? What deal would you have made if you didn't know that the first digit of pi was odd, or if you didn't know that 1+1=2?
On the other hand, suppose the staples maximizer is wrong. Does that mean you also shouldn't agree to exchange control of the universe before you knew the millionth digit of pi?
To make this more relevant to real life, consider two humans negotiating over the goal system of an AI they're jointly building. They have a lot of ignorance about the relevant logical facts, like how smart/powerful the AI will turn out to be and how efficient it will be in implementing each of their goals. They could negotiate a solution now in the form of a weighted average of their utility functions, but the weights they choose now will likely turn out to be "wrong" in full view of the relevant logical facts (e.g., the actual shape of the utility-possibility frontier). Or they could program their utility functions into the AI separately, and let the AI determine the weights later using some formal bargaining solution when it has more knowledge about the relevant logical facts. Which is the right thing to do? Or should they follow the staples maximizer's reasoning and bargain under the pretense that they know even less than they actually do?
Other Related Posts: Counterfactual Mugging and Logical Uncertainty, If you don't know the name of the game, just tell me what I mean to you
(I'll review some motivations for decision theories in the context of Counterfactual Mugging, leading to the answer.)
Precommitment in the past, where it's allowed, was a CDT-style solution to problems like this. You'd try making the most general possible precommitment as far in the past as possible that would respond to any possible future observations. This had two severe problems: it's not always possible to be far enough in the past to make precommitments that would coordinate all relevant future events, and you have to plan every possible detail of future events in advance.
TDT partially resolves such problems by implementing coordinated decisions among the instances of the agent within agent's current worlds (permitted by observations so far) that share the same epistemic state (or its aspects relevant to the decision) and decide for all of themselves together, so arrive at the same decision. (It makes sense for the decision to be a strategy that then can take into account additional information differentiating the instances of the agent.) This is enough for Newcomb's problem and (some versions of) Prisoner's Dilemma, but where coordination of agents in mutually exclusive counterfactuals are concerned, some of the tools break down.
Counterfactual Mugging both concerns agents located in mutually exclusive counterfactuals, and explicitly forbids the agent to be present in the past to make a precommitment, so TDT fails to apply. In this case, UDT (not relying on causal graphs) can define a common decision problem shared by the agents from different counterfactuals, if these agents can be first reduced to a shared epistemic state, so that all of them would arrive at the same decision (which takes the form of a strategy), which is then given each agent's particular additional knowledge that differentiates it from the other agents within the group that makes the coordinated decision.
In the most general case, where we attempt to coordinate among all UDT agents, these agents arrive, without using any knowledge other than what can be generated by pure inference (assumed common among these agents), at a single global strategy that specifies the moves of all agents (depending on each agent's particular knowledge and observations). However, when applied to a simple situation like Counterfactual Mugging, an agent only needs to purge itself of one bit of knowledge (identifying an agent) and select a simple coordinated strategy (for both agents) that takes that bit back as input to produce a concrete action.
So this takes us the whole circle, from deciding in a moment, to deciding (on a precommitment) in advance, and to deciding (on a coordinated strategy) in the present (of each instance). However, the condition for producing a coordinated strategy in the present is different from that for producing a precommitment in the past: all we need is shared state of knowledge among the to-be-coordinated agents, and not the state of knowledge they could've shared in the past, if they were to attempt a precommitment.
So for this problem, in coordinating with the other player (which let's assume abstractly exists, even if with measure 0), you can use your knowledge of the millionth digit of pi, since both players share it. And using this shared knowledge, the strategy you both arrive at would favor the world that's permitted by that value, in this case the paperclip world, the other world doesn't matter, contrary to what would be the case with a coin toss instead of the accessible abstract fact. And since the other player has nothing of value to offer, you take the whole pie.
Suppose you're currently running a decision theory that would "take the whole pie" in this situation. Now what if Omega first informed you of the setup without telling you what the millionth digit of pi is, and gave you a chance to self-modify? And suppose you don't have enough computing power to compute the digit yourself at this point. Doesn't it seems right to self-modify into someone who would give control of the universe to the staples maximizer, since that gives you 1/2 "logical" probability of 10^20 paperclips instead of 1/2 &quo... (read more)