Problematic Problems for TDT

drnickbone

62 Problematic Problems for TDT

29th May 2012

4 min read

62

A key goal of Less Wrong's "advanced" decision theories (like TDT, UDT and ADT) is that they should out-perform standard decision theories (such as CDT) in contexts where another agent has access to the decider's code, or can otherwise predict the decider's behaviour. In particular, agents who run these theories will one-box on Newcomb's problem, and so generally make more money than agents which two-box. Slightly surprisingly, they may well continue to one-box even if the boxes are transparent, and even if the predictor Omega makes occasional errors (a problem due to Gary Drescher, which Eliezer has described as equivalent to "counterfactual mugging"). More generally, these agents behave like a CDT agent will wish it had pre-committed itself to behaving before being faced with the problem.

However, I've recently thought of a class of Omega problems where TDT (and related theories) appears to under-perform compared to CDT. Importantly, these are problems which are "fair" - at least as fair as the original Newcomb problem - because the reward is a function of the agent's actual choices in the problem (namely which box or boxes get picked) and independent of the method that the agent uses to choose, or of its choices on any other problems. This contrasts with clearly "unfair" problems like the following:

Discrimination: Omega presents the usual two boxes. Box A always contains $1000. Box B contains nothing if Omega detects that the agent is running TDT; otherwise it contains $1 million.

So what are some fair "problematic problems"?

Problem 1: Omega (who experience has shown is always truthful) presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT. I won't tell you what the agent decided, but I will tell you that if the agent two-boxed then I put nothing in Box B, whereas if the agent one-boxed then I put $1 million in Box B. Regardless of how the simulated agent decided, I put $1000 in Box A. Now please choose your box or boxes."

Analysis: Any agent who is themselves running TDT will reason as in the standard Newcomb problem. They'll prove that their decision is linked to the simulated agent's, so that if they two-box they'll only win $1000, whereas if they one-box they will win $1 million. So the agent will choose to one-box and win $1 million.

However, any CDT agent can just take both boxes and win $1001000. In fact, any other agent who is not running TDT (e.g. an EDT agent) will be able to re-construct the chain of logic and reason that the simulation one-boxed and so box B contains the $1 million. So any other agent can safely two-box as well.

Note that we can modify the contents of Box A so that it contains anything up to $1 million; the CDT agent (or EDT agent) can in principle win up to twice as much as the TDT agent.

Problem 2: Our ever-reliable Omega now presents ten boxes, numbered from 1 to 10, and announces the following. "Exactly one of these boxes contains $1 million; the others contain nothing. You must take exactly one box to win the money; if you try to take more than one, then you won't be allowed to keep any winnings. Before you entered the room, I ran multiple simulations of this problem as presented to an agent running TDT, and determined the box which the agent was least likely to take. If there were several such boxes tied for equal-lowest probability, then I just selected one of them, the one labelled with the smallest number. I then placed $1 million in the selected box. Please choose your box."

Analysis: A TDT agent will reason that whatever it does, it cannot have more than 10% chance of winning the $1 million. In fact, the TDT agent's best reply is to pick each box with equal probability; after Omega calculates this, it will place the $1 million under box number 1 and the TDT agent has exactly 10% chance of winning it.

But any non-TDT agent (e.g. CDT or EDT) can reason this through as well, and just pick box number 1, so winning $1 million. By increasing the number of boxes, we can ensure that TDT has arbitrarily low chance of winning, compared to CDT which always wins.

Some questions:

1. Have these or similar problems already been discovered by TDT (or UDT) theorists, and if so, is there a known solution? I had a search on Less Wrong but couldn't find anything obviously like them.

2. Is the analysis correct, or is there some subtle reason why a TDT (or UDT) agent would choose differently from described?

3. If a TDT agent believed (or had reason to believe) that Omega was going to present it with such problems, then wouldn't it want to self-modify to CDT? But this seems paradoxical, since the whole idea of a TDT agent is that it doesn't have to self-modify.

4. Might such problems show that there cannot be a single TDT algorithm (or family of provably-linked TDT algorithms) so that when Omega says it is simulating a TDT agent, it is quite ambiguous what it is doing? (This objection would go away if Omega revealed the source-code of its simulated agent, and the source-code of the choosing agent; each particular version of TDT would then be out-performed on a specific matching problem.)

5. Are these really "fair" problems? Is there some intelligible sense in which they are not fair, but Newcomb's problem is fair? It certainly looks like Omega may be "rewarding irrationality" (i.e. giving greater gains to someone who runs an inferior decision theory), but that's exactly the argument that CDT theorists use about Newcomb.

6. Finally, is it more likely that Omegas - or things like them - will present agents with Newcomb and Prisoner's Dilemma problems (on which TDT succeeds) rather than problematic problems (on which it fails)?

Edit: I tweaked the explanation of Box A's contents in Problem 1, since this was causing some confusion. The idea is that, as in the usual Newcomb problem, Box A always contains $1000. Note that Box B depends on what the simulated agent chooses; it doesn't depend on Omega predicting what the actual deciding agent chooses (so Omega doesn't put less money in any box just because it sees that the actual decider is running TDT).

Decision theory

Frontpage

62

New Comment

Rendering 0/293 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 9:05 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

62 Problematic Problems for TDT

by drnickbone

29th May 2012

4 min read

293

62

Discrimination: Omega presents the usual two boxes. Box A always contains $1000. Box B contains nothing if Omega detects that the agent is running TDT; otherwise it contains $1 million.

So what are some fair "problematic problems"?

Note that we can modify the contents of Box A so that it contains anything up to $1 million; the CDT agent (or EDT agent) can in principle win up to twice as much as the TDT agent.

Some questions:

1. Have these or similar problems already been discovered by TDT (or UDT) theorists, and if so, is there a known solution? I had a search on Less Wrong but couldn't find anything obviously like them.

2. Is the analysis correct, or is there some subtle reason why a TDT (or UDT) agent would choose differently from described?

Decision theory

Frontpage

62

Mentioned in

39Asymptotic Decision Theory (Improved Writeup)

19False thermodynamic miracles

15Identity and quining in UDT

12Sneaky Strategies for TDT

11An optimality result for modal UDT

Load More (5/6)

New Comment

Rendering 0/293 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 9:05 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

More from drnickbone

Curated and popular this week

293Comments

293

Comment Permalink

jimrandomh14y370

You can construct a "counterexample" to any decision theory by writing a scenario in which it (or the decision theory you want to have win) is named explicitly. For example, consider Alphabetic Decision Theory, which writes a description of each of the options, then chooses whichever is first alphabetically. ADT is bad, but not so bad that you can't make it win: you could postulate an Omega which checks to see whether you're ADT, gives you $1000 if you are, and tortures you for a year if you aren't.

That's what's happening in Problem 1, except that it's a little bit hidden. There, you have an Omega which says: if you are TDT, I will make the content of these boxes depend on your choice in such a way that you can't have both; if you aren't TDT, I filled both boxes.

You can see that something funny has hapened by postulating TDT-prime, which is identical to TDT except that Omega doesn't recognize it as a duplicate (eg, it differs in some way that should be irrelevant). TDT-prime would two-box, and win.

ewbrownv14y20

Indeed. These are all scenarios of the form "Omega looks at the source code for your decision theory, and intentionally creates a scenario that breaks it." Omega could do this with any possible decision theory (or at last, anything that could be implemented with finite resources), so what exactly are we supposed to learn by contemplating specific examples?

It seems to me that the valuable Omega thought experiments are the ones where Omega's omnipotence is simply used to force the player to stick to the rules of the given scenario. When you start p... (read more)

30Paul Crowley14y

Right, but this is exactly the insight of this post put another way. The possibility of an Omega that rewards eg ADT is discussed in Eliezer's TDT paper. He sets out an idea of a "fair" test, which evaluates only what you do and what you are predicted to do, not what you are. What's interesting about this is that this is a "fair" test by that definition, yet it acts like an unfair test. Because it's a fair test, it doesn't matter whether Omega thinks TDT and TDT-prime are the same - what matters is whether TDT-prime thinks so.

1APMason14y

I don't think so. If TDT-prime two boxes, the TDT simulation two-boxes, so only one box is full, so TDT-prime walks away with $1000. Omega doesn't check what decision theory you're using at all - it just simulates TDT and bases its decision on that. I do think that this ought to fall outside a rigorously defined class of "fair" problems, but it doesn't matter whether Omega can recognise you as a TDT-agent or not.

See in context