Suppose you wake up as a paperclip maximizer. Omega says "I calculated the millionth digit of pi, and it's odd. If it had been even, I would have made the universe capable of producing either 1020 paperclips or 1010 staples, and given control of it to a staples maximizer. But since it was odd, I made the universe capable of producing 1010 paperclips or 1020 staples, and gave you control." You double check Omega's pi computation and your internal calculator gives the same answer.
Then a staples maximizer comes to you and says, "You should give me control of the universe, because before you knew the millionth digit of pi, you would have wanted to pre-commit to a deal where each of us would give the other control of the universe, since that gives you 1/2 probability of 1020 paperclips instead of 1/2 probability of 1010 paperclips."
Is the staples maximizer right? If so, the general principle seems to be that we should act as if we had precommited to a deal we would have made in ignorance of logical facts we actually possess. But how far are we supposed to push this? What deal would you have made if you didn't know that the first digit of pi was odd, or if you didn't know that 1+1=2?
On the other hand, suppose the staples maximizer is wrong. Does that mean you also shouldn't agree to exchange control of the universe before you knew the millionth digit of pi?
To make this more relevant to real life, consider two humans negotiating over the goal system of an AI they're jointly building. They have a lot of ignorance about the relevant logical facts, like how smart/powerful the AI will turn out to be and how efficient it will be in implementing each of their goals. They could negotiate a solution now in the form of a weighted average of their utility functions, but the weights they choose now will likely turn out to be "wrong" in full view of the relevant logical facts (e.g., the actual shape of the utility-possibility frontier). Or they could program their utility functions into the AI separately, and let the AI determine the weights later using some formal bargaining solution when it has more knowledge about the relevant logical facts. Which is the right thing to do? Or should they follow the staples maximizer's reasoning and bargain under the pretense that they know even less than they actually do?
Other Related Posts: Counterfactual Mugging and Logical Uncertainty, If you don't know the name of the game, just tell me what I mean to you
Here's an argument I made in a chat with Wei. (The problem is equivalent to Counterfactual Mugging with a logical coin, so I talk about that instead.)
1) A good decision theory should always do what it would have precommitted to doing.
2) Precommitment can be modeled as a decision problem where an AI is asked to write a successor AI.
3) Imagine the AI is asked to write a program P that will be faced with Counterfactual Mugging with a logical coin (e.g. parity of the millionth digit of pi). The resulting utility goes to the AI. The AI writing P doesn't have enough resources to compute the coin's outcome, but P is allowed to use as much resources as needed.
4) Writing P is equivalent to supplying only one bit: should P pay up if asked?
5) Supplying that bit is equivalent to accepting or declining the bet "win $10000 if the millionth digit of pi is even, lose $100 if it's odd".
6) So if your AI can make bets about the digits of pi (which means it represents logical uncertainty as probabilities), it should also pay up in Counterfactual Mugging with a logical coin, even if it already has enough resources to compute the coin's outcome. The AI's initlal state of logical uncertainty should be "frozen" into its utility function, just like all other kinds of uncertainty (the U in UDT means "updateless").
Maybe this argument only shows that representing logical uncertainty as probabilities is weird. Everyone is welcome to try and figure out a better way :-)
It's dangerous to phrase it this way, since coordination (which is what really happens) allows using more knowledge than was available at the time of a possible precommitment, as I described here.
Not if the correct decision depends on an abstract fact that you can't access, but can reference. In that case, P should implement a strategy of acting depending on the value of that fact (computi... (read more)