A key goal of Less Wrong's "advanced" decision theories (like TDT, UDT and ADT) is that they should out-perform standard decision theories (such as CDT) in contexts where another agent has access to the decider's code, or can otherwise predict the decider's behaviour. In particular, agents who run these theories will one-box on Newcomb's problem, and so generally make more money than agents which two-box. Slightly surprisingly, they may well continue to one-box even if the boxes are transparent, and even if the predictor Omega makes occasional errors (a problem due to Gary Drescher, which Eliezer has described as equivalent to "counterfactual mugging"). More generally, these agents behave like a CDT agent will wish it had pre-committed itself to behaving before being faced with the problem.
However, I've recently thought of a class of Omega problems where TDT (and related theories) appears to under-perform compared to CDT. Importantly, these are problems which are "fair" - at least as fair as the original Newcomb problem - because the reward is a function of the agent's actual choices in the problem (namely which box or boxes get picked) and independent of the method that the agent uses to choose, or of its choices on any other problems. This contrasts with clearly "unfair" problems like the following:
Discrimination: Omega presents the usual two boxes. Box A always contains $1000. Box B contains nothing if Omega detects that the agent is running TDT; otherwise it contains $1 million.
So what are some fair "problematic problems"?
Problem 1: Omega (who experience has shown is always truthful) presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT. I won't tell you what the agent decided, but I will tell you that if the agent two-boxed then I put nothing in Box B, whereas if the agent one-boxed then I put $1 million in Box B. Regardless of how the simulated agent decided, I put $1000 in Box A. Now please choose your box or boxes."
Analysis: Any agent who is themselves running TDT will reason as in the standard Newcomb problem. They'll prove that their decision is linked to the simulated agent's, so that if they two-box they'll only win $1000, whereas if they one-box they will win $1 million. So the agent will choose to one-box and win $1 million.
However, any CDT agent can just take both boxes and win $1001000. In fact, any other agent who is not running TDT (e.g. an EDT agent) will be able to re-construct the chain of logic and reason that the simulation one-boxed and so box B contains the $1 million. So any other agent can safely two-box as well.
Note that we can modify the contents of Box A so that it contains anything up to $1 million; the CDT agent (or EDT agent) can in principle win up to twice as much as the TDT agent.
Problem 2: Our ever-reliable Omega now presents ten boxes, numbered from 1 to 10, and announces the following. "Exactly one of these boxes contains $1 million; the others contain nothing. You must take exactly one box to win the money; if you try to take more than one, then you won't be allowed to keep any winnings. Before you entered the room, I ran multiple simulations of this problem as presented to an agent running TDT, and determined the box which the agent was least likely to take. If there were several such boxes tied for equal-lowest probability, then I just selected one of them, the one labelled with the smallest number. I then placed $1 million in the selected box. Please choose your box."
Analysis: A TDT agent will reason that whatever it does, it cannot have more than 10% chance of winning the $1 million. In fact, the TDT agent's best reply is to pick each box with equal probability; after Omega calculates this, it will place the $1 million under box number 1 and the TDT agent has exactly 10% chance of winning it.
But any non-TDT agent (e.g. CDT or EDT) can reason this through as well, and just pick box number 1, so winning $1 million. By increasing the number of boxes, we can ensure that TDT has arbitrarily low chance of winning, compared to CDT which always wins.
Some questions:
1. Have these or similar problems already been discovered by TDT (or UDT) theorists, and if so, is there a known solution? I had a search on Less Wrong but couldn't find anything obviously like them.
2. Is the analysis correct, or is there some subtle reason why a TDT (or UDT) agent would choose differently from described?
3. If a TDT agent believed (or had reason to believe) that Omega was going to present it with such problems, then wouldn't it want to self-modify to CDT? But this seems paradoxical, since the whole idea of a TDT agent is that it doesn't have to self-modify.
4. Might such problems show that there cannot be a single TDT algorithm (or family of provably-linked TDT algorithms) so that when Omega says it is simulating a TDT agent, it is quite ambiguous what it is doing? (This objection would go away if Omega revealed the source-code of its simulated agent, and the source-code of the choosing agent; each particular version of TDT would then be out-performed on a specific matching problem.)
5. Are these really "fair" problems? Is there some intelligible sense in which they are not fair, but Newcomb's problem is fair? It certainly looks like Omega may be "rewarding irrationality" (i.e. giving greater gains to someone who runs an inferior decision theory), but that's exactly the argument that CDT theorists use about Newcomb.
6. Finally, is it more likely that Omegas - or things like them - will present agents with Newcomb and Prisoner's Dilemma problems (on which TDT succeeds) rather than problematic problems (on which it fails)?
Edit: I tweaked the explanation of Box A's contents in Problem 1, since this was causing some confusion. The idea is that, as in the usual Newcomb problem, Box A always contains $1000. Note that Box B depends on what the simulated agent chooses; it doesn't depend on Omega predicting what the actual deciding agent chooses (so Omega doesn't put less money in any box just because it sees that the actual decider is running TDT).
Here is a useful program:
if (being_simulated()) pick_box_b(); else pick_both_boxes();
being_simulated() is an interesting function; it would presumably consist of things like comparing star maps across as broad a range of EM frequencies as possible with historical knowledge, the positions of planetary bodies, Earthly satellites, quasar signatures, and inputs of other individuals and networks. It could also be as simple as "if (get_current_average_cosmic_ray_bit_flips() != historical_bit_flips_average) return true;" Since my brain tries to play Omega with me every night (well, on the nights I get good sleep) I think we all have some practical experience with being_simulated(). While our brains are nowhere near as capable as Omega they are fairly good at suspending our own disbelief during dreams . What generally breaks disbelief is trying to behave rationally, including observing/testing the environment which in reality would be the outside world but in a dream is simulated. Ultimately Omega is forced to simulate a very large subset of the Universe in order to cause being_simulated() to return false, or to alter the program itself so that being_simulated vacuously returns true.
To prevent the latter attack the program should execute a fully homomorphically encrypted sub-function and base all its decisions on the outputs of that function. In other words its true decision theory should be hidden even from the hardware running it. This pushes Omega's capabilities into the cryptographic realm and the ability to solve arbitrarily hard cryptography problems. Once this is done, Omega becomes an oracle for solving any hard problem, e.g. by running this function as an input to the final decision theory:
if (strange_paradoxical_decision_is_required()) { bank_secret = factor_omegas_bank_account_public_key(); return sign_and_cash_check_for_one_billion_dollars(bank_secret); }
This restricts Omega to putting a time limit (or computation limit) on its simulations to avoid being suckered by its own computing power. Presumably Omega also puts a time limit on the real time a being can spend thinking before choosing which boxes it wants, but presumably that time is long enough for some fairly serious philosophizing and likely more than enough time for a reliable being_simulated() call.
Another approach is to consider how much information an intelligent being can gather about its environment. A human would have trouble determining whether two boxes are empty or full, but additional sensors could detect the presence of money in the boxes fairly reliably. What matters for Omega is making sure that a being cannot determine the contents of the boxes before picking them. From the perspective of a rational being this is equivalent to the boxes being filled with cash after making a decision. If Omega has the capability to obscure the contents of boxes then Omega certainly has the ability to obscure the placement of money into the boxes as they are chosen (just a glorified magic trick). Given that interpretation, CDT will one-box.
EDIT: I apologize for the formatting, I am not very good at escaping/formatting apparently.
This strategy is discussed in the follow-up article.
In general it's difficult, because by assumption Omega has the computational power to simulate more or less anything (including an environment matching the world as you remember it; this might be like the real world, or you might have spent your whole life so far as a sim). And the usual environment for these problems is a sealed room, so that you can't look at the stars etc.