(This post doesn't require much math. It's very speculative and probably confused.)
Wei Dai came up with a problem that seems equivalent to a variant of Counterfactual Mugging with some added twists:
- the coinflip is "logical", e.g. the parity of the millionth digit of pi;
- after you receive the offer, you will have enough resources to calculate the coinflip's outcome yourself;
- but you need to figure out the correct decision algorithm ahead of time, when you don't have these resources and are still uncertain about the coinflip's outcome.
If you give 50/50 chances now to the millionth digit of pi being even or odd, you probably want to write the decision algorithm so it agrees to pay up later even when faced with a proof that the millionth digit of pi is even. But from the decision algorithm's point of view, the situation looks more like being asked to pay up because 2+2=4. How do we resolve this tension?
One of the main selling points of TDT-style decision theories is eliminating the need for precommitment. You're supposed to always do what you would have precommitted to doing, even if it doesn't seem like a very good idea after you've done your Bayesian updates. UDT solves Counterfactual Mugging and similar problems by being "updateless", so you keep caring about possible worlds in accordance with their apriori probabilities regardless of which world you end up in.
If we take the above problem at face value, it seems to tell us that UDT should treat logical uncertainty updatelessly too, and keep caring about logically impossible worlds in accordance with their apriori logical probabilities. It seems to hint that UDT should be coded from the start with a "logical prior" over mathematical statements, which encodes the creator's arbitrary "logical degrees of caring", just like its regular prior encodes the creator's arbitrary degrees of caring over physics. Then the AI must keep following that prior forever after. But that's a very tall order. Should you really keep caring about logically impossible worlds where 2+2=5, and accept bargains that help copies of you in such worlds, even after you calculate that 2+2=4?
That conclusion is pretty startling, but consider what happens if you reject it:
- Precommitment can be modeled as a decision problem where an AI is asked to write a successor AI.
- Imagine the AI is asked to write a program P that will be faced with Counterfactual Mugging with a logical coin. The AI doesn't have enough resources to calculate the coin's outcome, but P will have as much computing power as needed. The resulting utility goes to the AI.
- Writing P is equivalent to supplying one bit: should P pay up if asked?
- Supplying that bit is equivalent to accepting or refusing the bet "win $10000 if the millionth digit of pi is odd, lose $100 if it's even".
So if your AI treats logical uncertainty similarly enough to probabilities that it can make bets on digits of pi, reflective consistency seems to force it to have an unchanging "logical prior", and keep paying up in Counterfactual Mugging even when the logical coinflip looks as obvious to the AI as 2+2=4. Is there any way to escape this conclusion? (Nesov has an idea, but I can't parse it yet.) And what could a formalization of "logical priors" possibly look like?
Perhaps I am missing the obvious, but why is this a hard problem? So our protagonist AI has some algorithm to determine if the millionth digit of pi is odd- he cannot run it yet, but he has it. Lets call that function f{}, that returns a 1 if the digit is odd, or a 0 if it is even. He also has some other function like: sub pay_or_no { if (f{}) { pay(1000); }
In this fashion, Omega can verify the algorithm that returns the millionth digit of pi, independently verify the algorithm that pays based on that return, and our protagonist gets his money.
!!!!
This seems to be the correct answer to jacobt's question. The key is looking at the length of proofs. The general rule should go like this: when you're trying to decide which of two impossible counterfactuals "a()=x implies b()=y" and "a()=x implies b()=z" is more true even though "a()=x" is false, go with the one that has the shorter proof. We already use that rule when implementing agents that compute counterfactuals about their own actions. Now we can just implement Omega using the same rule. If the millionth digit of p... (read more)