# Problematic Problems for TDT

34 29 May 2012 03:41PM

A key goal of Less Wrong's "advanced" decision theories (like TDT, UDT and ADT) is that they should out-perform standard decision theories (such as CDT) in contexts where another agent has access to the decider's code, or can otherwise predict the decider's behaviour. In particular, agents who run these theories will one-box on Newcomb's problem, and so generally make more money than agents which two-box. Slightly surprisingly, they may well continue to one-box even if the boxes are transparent, and even if the predictor Omega makes occasional errors (a problem due to Gary Drescher, which Eliezer has described as equivalent to "counterfactual mugging"). More generally, these agents behave like a CDT agent will wish it had pre-committed itself to behaving before being faced with the problem.

However, I've recently thought of a class of Omega problems where TDT (and related theories) appears to under-perform compared to CDT. Importantly, these are problems which are "fair" - at least as fair as the original Newcomb problem - because the reward is a function of the agent's actual choices in the problem (namely which box or boxes get picked) and independent of the method that the agent uses to choose, or of its choices on any other problems. This contrasts with clearly "unfair" problems like the following:

Discrimination: Omega presents the usual two boxes. Box A always contains \$1000. Box B contains nothing if Omega detects that the agent is running TDT; otherwise it contains \$1 million.

So what are some fair "problematic problems"?

Problem 1: Omega (who experience has shown is always truthful) presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT. I won't tell you what the agent decided, but I will tell you that if the agent two-boxed then I put nothing in Box B, whereas if the agent one-boxed then I put \$1 million in Box B. Regardless of how the simulated agent decided, I put \$1000 in Box A. Now please choose your box or boxes."

Analysis: Any agent who is themselves running TDT will reason as in the standard Newcomb problem. They'll prove that their decision is linked to the simulated agent's, so that if they two-box they'll only win \$1000, whereas if they one-box they will win \$1 million. So the agent will choose to one-box and win \$1 million.

However, any CDT agent can just take both boxes and win \$1001000. In fact, any other agent who is not running TDT (e.g. an EDT agent) will be able to re-construct the chain of logic and reason that the simulation one-boxed and so box B contains the \$1 million. So any other agent can safely two-box as well.

Note that we can modify the contents of Box A so that it contains anything up to \$1 million; the CDT agent (or EDT agent) can in principle win up to twice as much as the TDT agent.

Problem 2: Our ever-reliable Omega now presents ten boxes, numbered from 1 to 10, and announces the following. "Exactly one of these boxes contains \$1 million; the others contain nothing. You must take exactly one box to win the money; if you try to take more than one, then you won't be allowed to keep any winnings. Before you entered the room, I ran multiple simulations of this problem as presented to an agent running TDT, and determined the box which the agent was least likely to take. If there were several such boxes tied for equal-lowest probability, then I just selected one of them, the one labelled with the smallest number. I then placed \$1 million in the selected box. Please choose your box."

Analysis: A TDT agent will reason that whatever it does, it cannot have more than 10% chance of winning the \$1 million. In fact, the TDT agent's best reply is to pick each box with equal probability; after Omega calculates this, it will place the \$1 million under box number 1 and the TDT agent has exactly 10% chance of winning it.

But any non-TDT agent (e.g. CDT or EDT) can reason this through as well, and just pick box number 1, so winning \$1 million. By increasing the number of boxes, we can ensure that TDT has arbitrarily low chance of winning, compared to CDT which always wins.

Some questions:

1. Have these or similar problems already been discovered by TDT (or UDT) theorists, and if so, is there a known solution? I had a search on Less Wrong but couldn't find anything obviously like them.

2. Is the analysis correct, or is there some subtle reason why a TDT (or UDT) agent would choose differently from described?

3. If a TDT agent believed (or had reason to believe) that Omega was going to present it with such problems, then wouldn't it want to self-modify to CDT? But this seems paradoxical, since the whole idea of a TDT agent is that it doesn't have to self-modify.

4. Might such problems show that there cannot be a single TDT algorithm (or family of provably-linked TDT algorithms) so that when Omega says it is simulating a TDT agent, it is quite ambiguous what it is doing? (This objection would go away if Omega revealed the source-code of its simulated agent, and the source-code of the choosing agent; each particular version of TDT would then be out-performed on a specific matching problem.)

5. Are these really "fair" problems? Is there some intelligible sense in which they are not fair, but Newcomb's problem is fair? It certainly looks like Omega may be "rewarding irrationality" (i.e. giving greater gains to someone who runs an inferior decision theory), but that's exactly the argument that CDT theorists use about Newcomb.

6. Finally, is it more likely that Omegas - or things like them - will present agents with Newcomb and Prisoner's Dilemma problems (on which TDT succeeds) rather than problematic problems (on which it fails)?

Edit: I tweaked the explanation of Box A's contents in Problem 1, since this was causing some confusion. The idea is that, as in the usual Newcomb problem, Box A always contains \$1000. Note that Box B depends on what the simulated agent chooses; it doesn't depend on Omega predicting what the actual deciding agent chooses (so Omega doesn't put less money in any box just because it sees that the actual decider is running TDT).

Sort By: Controversial
Comment author: 23 May 2012 08:37:36PM *  1 point [-]

I wonder if there is a mathematician in this forum willing to present the issue in a form of a theorem and a proof for it, in a reasonable mathematical framework. So far all I can see is a bunch of ostensibly plausible informal arguments from different points of view.

Either this problem can be formalized, in which case such a theorem is possible to formulate (whether or not it is possible to prove), or it cannot, in which case it is pointless to argue about it.

Comment author: 23 May 2012 09:58:45PM 2 points [-]

Either this problem can be formalized, in which case such a theorem is possible to formulate (whether or not it is possible to prove), or it cannot, in which case it is pointless to argue about it.

Or it's hard to formalize.

Comment author: 23 May 2012 10:33:46PM *  -3 points [-]

Or it's hard to formalize.

It's pointless to argue about a decision theory problem until it is formalized, since there is no way to check the validity of any argument.

Comment author: 23 May 2012 11:04:28PM 0 points [-]

So, what ought one do when interested in a problem (decision theory or otherwise) that one does not yet understand well enough to formalize?

I suspect "go do something else until a proper formalization presents itself" is not the best possible answer for all problems, nor is "work silently on formalizing the problem and don't express or defend a position on it until I've succeeded."

Comment author: 23 May 2012 11:31:15PM *  1 point [-]

How about "work on formalizing the problem (silently or collaboratively, whatever your style is) and do not defend a position that cannot be successfully defended or refuted"?

Comment author: 24 May 2012 12:28:02AM 2 points [-]

Fair enough.
Is there a clear way to distinguish positions worth arguing without formality (e.g., the one you are arguing here) from those that aren't (e.g., the one you are arguing ought not be argued here)?

Comment author: 24 May 2012 01:21:51AM 2 points [-]

It's a good question. There ought to be, but I am not sure where the dividing line is.

Comment author: 23 May 2012 10:40:50PM *  0 points [-]

You check the arguments using mathematical intuition, and you use them to find better definitions. For example, problems involving continuity or real numbers were fruitfully studied for a very long time before rigorous definitions were found.

Comment author: 23 May 2012 11:26:16PM *  0 points [-]

You check them using mathematical intuition, and you use them to find better definitions.

Indeed, you use them to find better definitions, which is the first step in formalizing the problem. If you argue whose answer is right before doing so (as opposed, say, to which answer ought to be right once a proper formalization is found), you succumb to lost purposes.

For example, "TDT ought to always make the best decision in a certain class of problems" is a valid purpose, while "TDT fails on a Newcomb's problem with a TDT-aware predictor" is not a well-defined statement until every part of it is formalized.

[EDIT: I'm baffled by the silent downvote of my pleas for formalization.]

Comment author: 24 May 2012 12:53:32AM 1 point [-]

[EDIT: I'm baffled by the silent downvote of my pleas for formalization.]

If I had to guess, I'd say that the downvoters interpret those pleas, especially in the context of some of your other comments, as an oblique way of advocating for certain topics of discussion to simply not be mentioned at all.

Admittedly, I interpret them that way myself, so I may just be projecting my beliefs onto others.

Comment author: 24 May 2012 01:24:28AM 2 points [-]

as an oblique way of advocating for certain topics of discussion to simply not be mentioned at all

Wha...? Thank you for letting me know, though I still have no idea what you might mean, I'd greatly appreciate if you elaborate on that!

Comment author: 24 May 2012 04:54:43AM 7 points [-]

I'm not sure I can add much by elaboration.

My general impression of you(1) is that you consider much of the discussion that takes place here, and much of the thinking of the people who do it, to be kind of a silly waste of time, and that you further see your role here in part as the person who points that fact out to those who for whatever reason have failed to notice it.

Within that context, responding to a comment with a request to formalize it is easy to read as a polite way of expressing "what you just said is uselessly vague. If you are capable of saying something useful, do so, otherwise shut up and leave this subject to the grownups."

And since you aren't consistent about wanting everything to be expressed as a formalism, I assume this is a function of the topic of discussion, because that's the most charitable assumption I can think of.

That said, I reiterate that I have no special knowledge of why you're being downvoted; please don't take me as definitive.

(1) This might be an unfair impression, as I no longer remember what it was that led me to form it.

Comment author: 24 May 2012 02:38:36PM *  3 points [-]

Thank you! I always appreciate candid feedback.

Comment author: 24 May 2012 12:27:58PM 1 point [-]

My general impression of you(1) is that you consider much of the discussion that takes place here, and much of the thinking of the people who do it, to be kind of a silly waste of time, and that you further see your role here in part as the person who points that fact out to those who for whatever reason have failed to notice it.

It's too easy for this to turn into a general counterargument against anything the person says. It may be of benefit to play the ball and not the man.

Comment author: 30 May 2012 03:49:34PM 1 point [-]

Anything the person says? In respect to most things it would be a total non-sequitur.

Comment author: 25 May 2012 12:52:53AM 1 point [-]

Which issue/problem? fairness?

Comment author: 25 May 2012 05:35:09PM *  1 point [-]

The fairness concept:

the reward is a function of the agent's actual choices in the problem (namely which box or boxes get picked) and independent of the method that the agent uses to choose, or of its choices on any other problems.

should be reasonably easy to formalize, because it does not depend on a full [T]DT algorithm. After that, evaluate the performace of [a]DT under a [b]DT-aware Omega Newcomb's problems, as described in the OP, where 'a' and 'b' are particular DTs, e.g. a=b=T.

Comment author: 25 May 2012 10:04:12PM *  -2 points [-]

There was this Rocko thing a while back (which is not supposed to be discussed), where if I understood that nonsense correctly, the idea was that the decision theories here would do equivalent to one-boxing on Newcomb with transparent boxes where you could see there is no million, when there's no million. (and where the boxes were made and sealed before you were born). It's not easy to one-box rationally.

Also in practice usually being simulated correctly is awesome for getting scammed (agents tend to face adversaries rather than crazed beneficiaries).

Comment author: 23 May 2012 07:52:54AM *  3 points [-]

Problems 1 and 2 both look - to me - like fancy versions of the Discrimination problem. edit: I am much less sure of this. That is, Omega changes the world based on whether the agent implements TDT. This bit I am still sure of, but it might be the case that TDT can overcome this anyway.

Discrimination problem: Money Omega puts in room if you're TDT = \$1,000. Money Omega puts in room if you're not = \$1,001,000.

Problem 1: Money Omega puts in room if you're TDT = \$1,000 or \$1,001,000. Edit: made a mistake. The error in this problem may be subtler than I first claimed. Money Omega puts in room if you're not = \$1,001,000.

Problem 2: \$1,000,000 either way. This problem is different but also uninteresting. Due to Omega caring about TDT again, it is just the smallest interesting number paradox for TDT agents only. Other decision theories get a free ride because you're just asking them to reason about an algorithm (easy to show it produces a uniform distribution) and then a maths question (which box has the smallest number on it?).

You claim the rewards are

independent of the method that the agent uses to choose

but they're not. They depend on whether the agent uses TDT to choose or not.

Comment author: 23 May 2012 08:51:37AM 2 points [-]

I've edited the problem statement to clarify Box A slightly. Basically, Omega will put \$1001000 in the room (\$1000 for box A and \$1 million for Box B) regardless of the algorithm run by the actual deciding agent. The contents of the boxes depend only on what the simulated agent decides.

Comment author: 23 May 2012 08:11:35AM 2 points [-]

Agree. You use process X to determine the setup and agents instantiating X are going to be constrained. Any decision theory would be at a disadvantage when singled out like this.

Comment author: 23 May 2012 08:39:08PM 5 points [-]

Someone may already have mentioned this, but doesn't the fact that these scenarios include self-referencing components bring Goedel's Incompleteness Theorem into play somehow? I.e. As soon as we let decision theories become self-referencing, it is impossible for a "best" decision theory to exist at all.

Comment author: 21 June 2012 09:54:39AM *  0 points [-]

There was some discussion of much the same point in this comment thread

One important thing to consider is that there may be a sensible way to define "best" that is not susceptible to this type of problem. Most notably, there may be a suitable, solvable, and realistic subclass of problems over which to evaluate performance. Also, even if there is no "best", there can still be better and worse.

Comment author: 23 May 2012 08:19:53AM 0 points [-]

I don't understand the special role of box 1 in Problem 2. It seems to me that if Omega just makes different choices for the box in which to put the money, all decision theories will say "pick one at random" and will be equal.

In fact, the only reason I can see why Omega picks box 1 seems to be that the "pick at random" process of your TDT is exactly "pick the first one". Just replace it with something dependant on its internal clock (or any parameter not known at the time when Omega asks its question) and the problem disappears.

Comment author: 23 May 2012 09:00:11AM *  1 point [-]

Omega's choice of box depends on its assessment of the simulated agent's choosing probabilities. The tie-breaking rule (if there are several boxes with equal lowest choosing probability, then select the one with the lowest label) is to an extent arbitrary, but it is important that there is some deterministic tie-breaking rule.

I also agree this is entirely a maths problem for Omega or for anyone whose decisions aren't entangled with the problem (with a proof that Box 1 will contain the \$1 million). The difficulty is that a TDT agent can't treat it as a straight maths problem which is unlinked to its own decisions.

Comment author: 24 May 2012 03:19:31PM 1 point [-]

Why is it important that there is a deterministic breaking rule ? When you would like random numbers, isn't it always better to have a distribution as close as random as possible, even if it is pseudo-random ?

That question is perhaps stupid, I have the impression that I am missing something important...

Comment author: 25 May 2012 11:31:36AM 1 point [-]

Remember it is Omega implementing the tie-breaker rule, since it defines the problem.

The consequence of the tie-breaker is that the choosing agent knows that Omega's box-choice was a simple deterministic function of a mathematical calculation (or a proof). So the agent's uncertainty about which box contains the money is pure logical uncertainty.

Comment author: 23 May 2012 09:38:41AM 9 points [-]

"Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT. ...."

This needs some serious mathematics underneath it. Omega is supposed to run a simulation of how an agent of a certain sort handled a certain problem, the result of that simulation being a part of the problem itself. I don't think it's possible to tell, just from these English words, that there is a solution to this fixed-point formulation. And TDT itself hasn't been formalised, although I assume there are people (Eliezer? Marcello? Wei Dai?) working on that.

Cf. the construction of Gödel sentences: you can't just assume that a proof-system can talk about itself, you have to explicitly construct a way for it to talk about itself and show precisely what "talking about itself" means, before you can do all the cool stuff about undecidable sentences, Löb's theorem, and so on.

Comment author: 23 May 2012 07:57:46PM 11 points [-]

This seems well-specified to me: Since the agent is not told its own output in advance, it is possible to run the "simulation" and the "real version" in finite time. If you hand me a computer program that is the agent, I will hand you a computer program that is Omega and the environment.

Comment author: 29 May 2012 06:18:50PM *  1 point [-]

Here is a useful program:

if (being_simulated()) pick_box_b(); else pick_both_boxes();

being_simulated() is an interesting function; it would presumably consist of things like comparing star maps across as broad a range of EM frequencies as possible with historical knowledge, the positions of planetary bodies, Earthly satellites, quasar signatures, and inputs of other individuals and networks. It could also be as simple as "if (get_current_average_cosmic_ray_bit_flips() != historical_bit_flips_average) return true;" Since my brain tries to play Omega with me every night (well, on the nights I get good sleep) I think we all have some practical experience with being_simulated(). While our brains are nowhere near as capable as Omega they are fairly good at suspending our own disbelief during dreams . What generally breaks disbelief is trying to behave rationally, including observing/testing the environment which in reality would be the outside world but in a dream is simulated. Ultimately Omega is forced to simulate a very large subset of the Universe in order to cause being_simulated() to return false, or to alter the program itself so that being_simulated vacuously returns true.

To prevent the latter attack the program should execute a fully homomorphically encrypted sub-function and base all its decisions on the outputs of that function. In other words its true decision theory should be hidden even from the hardware running it. This pushes Omega's capabilities into the cryptographic realm and the ability to solve arbitrarily hard cryptography problems. Once this is done, Omega becomes an oracle for solving any hard problem, e.g. by running this function as an input to the final decision theory:

if (strange_paradoxical_decision_is_required()) { bank_secret = factor_omegas_bank_account_public_key(); return sign_and_cash_check_for_one_billion_dollars(bank_secret); }

This restricts Omega to putting a time limit (or computation limit) on its simulations to avoid being suckered by its own computing power. Presumably Omega also puts a time limit on the real time a being can spend thinking before choosing which boxes it wants, but presumably that time is long enough for some fairly serious philosophizing and likely more than enough time for a reliable being_simulated() call.

Another approach is to consider how much information an intelligent being can gather about its environment. A human would have trouble determining whether two boxes are empty or full, but additional sensors could detect the presence of money in the boxes fairly reliably. What matters for Omega is making sure that a being cannot determine the contents of the boxes before picking them. From the perspective of a rational being this is equivalent to the boxes being filled with cash after making a decision. If Omega has the capability to obscure the contents of boxes then Omega certainly has the ability to obscure the placement of money into the boxes as they are chosen (just a glorified magic trick). Given that interpretation, CDT will one-box.

EDIT: I apologize for the formatting, I am not very good at escaping/formatting apparently.

Comment author: 30 May 2012 06:43:09AM *  0 points [-]

if (beingsimulated()) pickboxb(); else pickboth_boxes()

This strategy is discussed in the follow-up article.

In general it's difficult, because by assumption Omega has the computational power to simulate more or less anything (including an environment matching the world as you remember it; this might be like the real world, or you might have spent your whole life so far as a sim). And the usual environment for these problems is a sealed room, so that you can't look at the stars etc.

Comment author: 23 May 2012 12:06:58PM 8 points [-]

But TDT already has this problem - TDT is all about finding a fixed point decision.

Comment author: 23 May 2012 10:37:18AM 12 points [-]

BTW, general question about decision theory. There appears to have been an academic study of decision theory for over a century, and causal and evidential decision theory were set out in 1981. Newcomb's paradox was set out in 1969. Yet it seems as though no-one thought to explore the space beyond these two decision theories until Eliezer proposed TDT, and it seems as if there is a 100% disconnect between the community exploring new theories (which is centered around LW) and the academic decision theory community. This seems really, really odd - what's going on?

Comment author: 23 May 2012 07:47:30PM 8 points [-]

There were plenty of previous theories trying to go beyond CDT or EDT, they just weren't satisfactory.

Comment author: 24 May 2012 07:49:00PM *  2 points [-]

Dispositional decision theory :P

... which I cannot find a link to the paper for, now. Hm. But basically it was just TDT, with less awareness of why.

EDIT: Ah, here it was. Credit to Tim Tyler.

Comment author: 24 May 2012 08:27:52PM 2 points [-]

I checked it. Not the same thing.

Comment author: 24 May 2012 09:36:36PM *  5 points [-]

This paper talks about reflexive decision models and claims to develop a form of CDT which one boxes.

It's in my to-read list but I haven't got to it yet so I'm not sure whether it's of interest but I'm posting it just in case (it could be a while until I have time to read it so I won't be able to post a more informed comment any time soon).

Though this theory post-dates TDT and so isn't interesting from that perspective.

Comment author: 23 May 2012 01:47:36PM 3 points [-]

It should be noted that Newcomb's problem was considered interesting in Philosophy in 1969, but decision theories were studied more in other fields - so there's a disconnect between the sorts of people who usually study formal decision theories and that sort of problem.

Comment author: 23 May 2012 12:59:51PM 12 points [-]

Yet it seems as though no-one thought to explore the space beyond these two decision theories until Eliezer proposed TDT...

This is simply not true. Robert Nozick (who introduced Newcomb's problem to philosophers) compared/contrasted EDT and CDT at least as far back as 1993. Even back then, he noted their inadequacy on several decision-theoretic problems and proposed some alternatives.

Comment author: 23 May 2012 01:14:53PM 4 points [-]

Me being ignorant of something seemed like a likely part of the explanation - thanks :) I take it you're referencing "The Nature of Rationality"? Not read that I'm afraid. If you can spare the time I'd be interested to know what he proposes -thanks!

Comment author: 23 May 2012 01:49:20PM *  6 points [-]

I haven't read The Nature of Rationality in quite a long time, so I won't be of much help. For a very simple and short introduction to Nozick's work on decision theory, you should read this (PDF).

Comment author: 23 May 2012 10:28:41AM *  9 points [-]

The more I think about it, the more interesting these problems get! Problem 1 seems to re-introduce all the issues that CDT has on Newcomb's Problem, but for TDT. I first thought to introduce the ability to 'break' with past selves, but that doesn't actually help with the simulation problem.

It did lead to a cute observation, though. Given that TDT cares about all sufficiently accurate simulations of itself, it's actually winning.

• It one-boxes in Problem 1; thus ensuring that its simulacrum one-boxed in Omega's pre-game simulation, so TDT walked away with \$2,000,000 (whereas CDT, unable to derive utility from a simulation of TDT, walked away with \$1,001,000.) This is proofed against increasing the value of the second box; TDT still gains at least 1 dollar more (when the second box is \$999,999), and simply two-boxes when the second box is as or more valuable.
• In Problem 2, it picks in such a way that Omega must run at least 10 trials and the game itself; this means 11 TDT agents have had a 10% shot at \$1,000,000. With an expected value of \$1,100,000 it is doing better than the CDT agents walking away with \$1,000,000.

It doesn't seem very relevant, but I think if we explored Richard's point that we need to actually formalise this, we'd find that any simulation high-fidelity enough to actually bind a TDT agent to its previous actions would necessarily give the agent the utility from the simulations, and vice versa, any simulation not accurate enough to give utility would be sufficiently different from TDT to allow our agent to two-box when that agent one-boxed.

Comment author: 23 May 2012 11:34:34AM 2 points [-]

Corollary: Omega can statically analyse the TDT agent's decision algorithm.

Comment author: 23 May 2012 11:15:39AM 8 points [-]

Omega doesn't need to simulate the agent actually getting the reward. After the agent has made its choice, the simulation can just end.

Comment author: 28 May 2012 09:31:42AM 1 point [-]

Omega is supposed to be always truthful, so either he rewards the sims as well, or you know something the sims don't and hence it's not obvious you'll do the same as them.

Comment author: 28 May 2012 10:15:10AM 0 points [-]

I thought Omega was allowed to lie to sims.

Even if he's not, after he's given a \$1m simulated reward, does he then have to keep up a simulated environment for the sim to actually spend the money?

Comment author: 28 May 2012 11:14:57AM 1 point [-]

If he can lie to sims, then you can't know he's not lying to you unless you know you're not a sim. If you do, it's not obvious you'd choose the same way as if you didn't.

Comment author: 28 May 2012 03:18:33PM 0 points [-]

For instance, if you think Omega is lying and completely ignore everything he says, you obviously two-box.

Comment author: 28 May 2012 03:32:18PM 1 point [-]

Why not zero-box in this case? I mean, what reason would I have to expect any money at all?

Comment author: 28 May 2012 04:02:27PM 0 points [-]

Well, as long as you believe Omega enough to think no box contains sudden death or otherwise negative utility, you'd open them to see what was inside. But yes, you might not believe Omega at all.

General question: suppose we encounter an alien. We have no idea what its motivations, values, goals, or abilities are. On the other hand, if may have observed any amount of human comm traffic from wireless EM signals since the invention of radio, and from actual spy-probes before the human invention of high tech that would detect them.

It signals us in Morse code from its remote starship, offering mutually benefitial trade.

What prior should we have about the alien's intention? Should we use a native uniform prior that would tell us it's as likely to mean us good as harm, and so never reply because we don't know how it will try to influence our actions via communications? Should it tell us different agents who don't explicitly value one another will conflict to the extent their values differ, and so since value-space is vast and a randomly selected alien is unlikely to share many values with us, we should prepare for war? Should it tell us we can make some assumptions (which?) about naturally evolved agents or their Friendly-to-themselves creations? How safe are we if we try to "just read" English text written by an unknown, possibly-superintelligence which may have observed all our broadcast traffic since the age of radio? What does our non-detection of this alien civ until they chose to initiate contact tell us? Etc.

Comment author: 24 May 2012 02:28:17AM -1 points [-]

Then the simulated TDT agent will one-box in Problem 1 so that the real TDT agent can two-box and get \$1,001,000. The simulated TDT agent will pick a box randomy with a uniform distribution in Problem 2, so that the real TDT agent can select box 1 like CDT would.

(If the agent is not receiving any reward, it will act in a way that maximises the reward agents sufficiently similar to it would receive. In this situation of 'you get no reward', CDT would be completely indifferent and could not be relied upon to set up a good situation for future actual CDT agents.)

Of course, this doesn't work if the simulated TDT agent is not aware that it won't receive a reward. This strays pretty close to "Omega is all-powerful and out to make sure you lose"-type problems.

Comment author: 06 June 2012 11:51:02AM 0 points [-]

Omega (who experience has shown is always truthful)

Omega doesn't need to simulate the agent actually getting the reward. After the agent has made its choice, the simulation can just end.

If we are assuming that Omega is trustworthy, then Omega needs to be assumed to be trustworthy in the simulation too. If they didn't allow the simulated version of the agent to enjoy the fruits of their choice, then they would not be trustworthy.

Comment author: 01 June 2012 10:01:33PM *  0 points [-]

Actually, I'm not sure this matters. If the simulated agent knows he's not getting a reward, he'd still want to choose so that the nonsimulated version of himself gets the best reward.

So the problem is that the best answer is unavailable to the simulated agent: in the simulation you should one box and in the 'real' problem you'd like to two box, but you have no way of knowing whether you're in the simulation or the real problem.

Agents that Omega didn't simulate don't have the problem of worrying whether they're making the decision in a simulation or not, so two boxing is the correct answer for them.

The decisions being made are very different between an agent that has to make the decision twice and the first decision will affect the payoff of the second versus an agent that has to make the decision only once, so I think that in reality perhaps the problem does collapse down to an 'unfair' one because the TDT agent is presented with an essentially different problem to a nonTDT agent.

Comment author: 23 May 2012 02:14:20PM *  28 points [-]

You can construct a "counterexample" to any decision theory by writing a scenario in which it (or the decision theory you want to have win) is named explicitly. For example, consider Alphabetic Decision Theory, which writes a description of each of the options, then chooses whichever is first alphabetically. ADT is bad, but not so bad that you can't make it win: you could postulate an Omega which checks to see whether you're ADT, gives you \$1000 if you are, and tortures you for a year if you aren't.

That's what's happening in Problem 1, except that it's a little bit hidden. There, you have an Omega which says: if you are TDT, I will make the content of these boxes depend on your choice in such a way that you can't have both; if you aren't TDT, I filled both boxes.

You can see that something funny has hapened by postulating TDT-prime, which is identical to TDT except that Omega doesn't recognize it as a duplicate (eg, it differs in some way that should be irrelevant). TDT-prime would two-box, and win.

Comment author: 11 June 2012 10:09:55PM 2 points [-]

Indeed. These are all scenarios of the form "Omega looks at the source code for your decision theory, and intentionally creates a scenario that breaks it." Omega could do this with any possible decision theory (or at last, anything that could be implemented with finite resources), so what exactly are we supposed to learn by contemplating specific examples?

It seems to me that the valuable Omega thought experiments are the ones where Omega's omnipotence is simply used to force the player to stick to the rules of the given scenario. When you start postulating that an impossible, acausal superintelligence is actively working agaisnt you it's time to hang up your hat and go home, because no strategy you could possibly come up with is going to do you any good.

Comment author: 24 December 2012 09:57:12PM 1 point [-]

The trouble is when another agent wins in this situation and in the situations you usually encounter. For example, an anti-traditional-rationalist, that always makes the opposite choice to a traditional rationalist, will one-box; it just fails spectacularly when asked to choose between different amounts of cake.

Comment author: 23 May 2012 03:09:44PM *  20 points [-]

Right, but this is exactly the insight of this post put another way. The possibility of an Omega that rewards eg ADT is discussed in Eliezer's TDT paper. He sets out an idea of a "fair" test, which evaluates only what you do and what you are predicted to do, not what you are. What's interesting about this is that this is a "fair" test by that definition, yet it acts like an unfair test.

Because it's a fair test, it doesn't matter whether Omega thinks TDT and TDT-prime are the same - what matters is whether TDT-prime thinks so.

Comment author: 23 May 2012 04:26:37PM 2 points [-]

Not exactly. Because the problem statement says that it simulates "TDT", if you were to expand the problem statement out into code it would have to contain source code to a complete instantiation of TDT. When the problem statement is run, TDT or TDT-prime can look at that instantiation and compare it to its own source code. TDT will see that they're the same, but TDT-prime will notice that they are different, and thereby infer that it is not the simulated copy. (Any difference whatsoever is proof of this.)

Consider an alternative problem. Omega flips a coin, and asks you to guess what it was, with a prize if you guess correctly. If the coin was heads, he shows you a piece of paper with TDT's source code. If the coin was tails, he shows you a piece of paper with your source code, whatever that is.

Comment author: 23 May 2012 05:54:33PM *  11 points [-]

I'm not sure the part about comparing source code is correct. TDT isn't supposed to search for exact copies of itself, it's supposed to search for parts of the world that are logically equivalent to itself.

Comment author: 06 June 2012 12:05:55PM 0 points [-]

The key thing is the question as to whether it could have been you that has been simulated. If all you know is that you're a TDT agent and what Omega simulated is a TDT agent, then it could have been you. Therefore you have to act as if your decision now may either real or simulated. If you know you are not what Omega simulated (for any reason), then you know that you only have to worry about the 'real' decision.

Comment author: 06 June 2012 04:34:19PM 0 points [-]

Suppose that Omega doesn't reveal the full source code of the simulated TDT agent, but just reveals enough logical facts about the simulated TDT agent to imply that it uses TDT. Then the "real" TDT Prime agent cannot deduce that it is different.

Comment author: 19 June 2012 07:30:10AM *  0 points [-]

Yes. I think that as long as there is any chance of you being the simulated agent, then you need to one box. So you one box if Omega tells you 'I simulated some agent', and one box if Omega tells you 'I simulated an agent that uses the same decision procedure as you', but two box if Omega tells you 'I simulated an agent that had a different copywrite comment in its source code to the comment in your source code'.

This is just a variant of the 'detect if I'm in a simulation' function that others mention. i.e. if Omega gives you access to that information in any way, you can two box. Of course, I'm a bit stuck on what Omega has told the simulation in that case. Has Omega done an infinite regress?

Comment author: 06 June 2012 03:57:44PM 0 points [-]

That's an interesting way to look at the problem. Thanks!

Comment author: 25 June 2012 07:16:55AM *  3 points [-]

Because it's a fair test

No, not even by Eliezer's standard, because TDT is not given the same problem than other decision theories.

As stated in comments below, everyone but TDT have the information "I'm not in the simulation" (or more precisely, in one of the simulations of the infinite regress that is implied by Omega's formulation). The reason TDT does not have this extra piece of information comes from the fact that it is TDT, not from any decision it may make.

Comment author: 25 June 2012 09:14:08AM 1 point [-]

Right, and this is an unfairness that Eliezer's definition fails to capture.

Comment author: 25 June 2012 11:43:57AM 0 points [-]

At this point, I need the text of that definition.

Comment author: 25 June 2012 12:04:12PM 0 points [-]

The definition is in Eliezer's TDT paper although a quick grep for "fair" didn't immediately find the definition.

Comment author: 25 June 2012 03:40:53PM *  0 points [-]

This variation of the problem was invented in the follow-up post (I think it was called "Sneaky strategies for TDT" or something like that:

Omega tells you that earlier he flipped a coin. If the coin came down heads, it simulated a CDT agent facing this problem. If the coin came down tails, it simulated a TDT agent facing this problem. In either case, if the simulated agent one-boxed, there is \$1000000 in Box-B; if it two-boxed Box-B is empty. In this case TDT still one-boxes (50% chance of \$1000000 dominates a 100% chance of \$1000), and CDT still two-boxes (because that's what CDT does). In this case, even though both agents have an equal chance of being simulated, CDT out-performs TDT (average payoffs of 500500 vs. 500000) - CDT takes advantage of TDT's prudence and TDT suffers for CDT's lack of it. Notice also that TDT cannot do better by behaving like CDT (both would get payoffs of 1000). This shows that the class of problems we're concerned with is not so much "fair" vs. "unfair", but more like "those problem on which the best I can do is not necessarily the best anyone can do". We can call it "fairness" if we want, but it's not like Omega is discriminating against TDT in this case.

Comment author: 25 June 2012 04:04:04PM *  3 points [-]

This is not a zero-sum game. CDT does not outperform TDT here. It just makes a stupid mistake, and happens to pay it less dearly than TDT

Let's say Omega submit the same problem to 2 arbitrary decision theories. Each will either 1-box or 2-box. Here is the average payoff matrix:

• Both a and b 1-box -> They both get the million
• Both a and b 2-box -> They both get 1000 only.
• One 1-boxes, the other 2-boxes -> the 1-boxer gets half a million, the other gets 5000 more.

Clearly, 1 boxing still dominates 2-boxing. Whatever the other does, you personally get about half a million more by 1-boxing. TDT may have less utility than CDT for 1-boxing, but CDT is still stupid here, while TDT is not.

Comment author: 23 May 2012 10:06:14PM 4 points [-]

He sets out an idea of a "fair" test, which evaluates only what you do and what you are predicted to do, not what you are.

Two questions: First, how does is this distinction justified? What a decision theory is is a strategy for responding to decision tasks and simulating agents performing the right decision tasks tells you what kind of decision theory they're using. Why does it matter if it's done implicitly (as in Newcomb's discrimination against CDT) or explicitly. And second why should we care about it? Why is it important for a decision theory to pass fair tests but not unfair tests?

Comment author: 24 May 2012 10:47:29AM 7 points [-]

Why is it important for a decision theory to pass fair tests but not unfair tests?

Well, on unfair tests a decision theory still needs to do as well as possible. If we had a version of the original Newcomb's problem, with the one difference that a CDT agent gets \$1billion just for showing up, it's still incumbent upon a TDT agent to walk away with \$1000000 rather than \$1000. The "unfair" class of problems is that class where "winning as much as possible" is distinct from "winning the most out of all possible agents".

Comment author: 24 May 2012 06:50:28AM 3 points [-]

Real-world unfair tests could matter, though it's not clear if there are any. However, hypothetical unfair tests aren't very informative about what is a good decision theory, because it's trivial to cook one up that favours one theory and disfavours another. I think the hope was to invent a decision theory that does well on all fair tests; the example above seems to show that may not be possible.

Comment author: 23 May 2012 02:28:04PM 1 point [-]

You can see that something funny has hapened by postulating TDT-prime, which is identical to TDT except that Omega doesn't recognize it as a duplicate (eg, it differs in some way that should be irrelevant). TDT-prime would two-box, and win.

I don't think so. If TDT-prime two boxes, the TDT simulation two-boxes, so only one box is full, so TDT-prime walks away with \$1000. Omega doesn't check what decision theory you're using at all - it just simulates TDT and bases its decision on that. I do think that this ought to fall outside a rigorously defined class of "fair" problems, but it doesn't matter whether Omega can recognise you as a TDT-agent or not.

Comment author: 23 May 2012 02:30:47PM 2 points [-]

I don't think so. If TDT-prime two boxes, the TDT simulation two-boxes, so only one box is full, so TDT-prime walks away with \$1000.

No, if TDT-prime two boxes, the TDT simulation still one-boxes.

Comment author: 23 May 2012 02:39:16PM 6 points [-]

Hmm, so TDT-prime would reason something like, "The TDT simulation will one-box because, not knowing that it's the simulation, but also knowing that the simulation will use exactly the same decision theory as itself, it will conclude that the simulation will do the same thing as itself and so one-boxing is the best option. However, I'm different to the TDT-simulation, and therefore I can safely two-box without affecting its decision." In which case, does it matter how inconsequential the difference is? Yep, I'm confused.

Comment author: 25 December 2012 04:07:16PM *  -1 points [-]

Yep, I'm confused.

Sounds like you have it exactly right.

Comment author: 23 May 2012 03:34:34PM 2 points [-]

I also had thoughts along these lines - variants of TDT could logically separate themselves, so that T-0 one-boxes when it is simulated, but T-1 has proven that T-0 will one-box, and hence T-1 two-boxes when T-0 is the sim.

But a couple of difficulties arise. The first is that if TDT variants can logically separate from each other (i.e. can prove that their decisions aren't linked) then they won't co-operate with each other in Prisoner's Dilemma. We could end up with a bunch of CliqueBots that only co-operate with their exact clones, which is not ideal.

The second difficulty is that for each specific TDT variant, one with algorithm T' say, there will be a specific problematic problem on which T' will do worse than CDT (and indeed worse than all the other variants of TDT) - this is the problem with T' being the exact algorithm running in the sim. So we still don't get the - desirable - property that there is some sensible decision theory called TDT that is optimal across fair problems.

The best suggestion I've heard so far is that we try to adjust the definition of "fairness", so that these problematic problems also count as "unfair". I'm open to proposals on that one...

Comment author: 23 May 2012 08:14:02PM 0 points [-]

The right place to introduce the separation is not in between TDT and TDT-prime, but in between TDT-prime's output and TDT-prime's decision. If its output is a strategy, rather than a number of boxes, then that strategy can include a byte-by-byte comparison; and if TDT and TDT-prime both do it that way, then they both win as much as possible.

Comment author: 23 May 2012 08:25:17PM 1 point [-]

But doesn't that make cliquebots, in general?

Comment author: 04 June 2012 11:39:19PM 0 points [-]

But a couple of difficulties arise. The first is that if TDT variants can logically separate from each other (i.e. can prove that their decisions aren't linked) then they won't co-operate with each other in Prisoner's Dilemma. We could end up with a bunch of CliqueBots that only co-operate with their exact clones, which is not ideal.

I think this is avoidable. Let's say that there are two TDT programs called Alice and Bob, which are exactly identical except that Alice's source code contains a comment identifying it as Alice, whereas Bob's source code contains a comment identifying it as Bob. Each of them can read their own source code. Suppose that in problem 1, Omega reveals that the source code it used to run the simulation was Alice. Alice has to one-box. But Bob faces a different situation than Alice does, because he can find a difference between his own source code and the one Omega simulated, whereas Alice could not. So Bob can two-box without effecting what Alice would do.

However, if Alice and Bob play the prisoner's dilemma against each other, the situation is much closer to symmetric. Alice faces a player identical to itself except with the "Alice" comment replaced with "Bob", and Bob faces a player identical to itself except with the "Bob" comment replaced with "Alice". Hopefully, their algorithm would compress this information down to "The other player is identical to me, but has a comment difference in its source code", at which point each player would be in an identical situation.

Comment author: 25 December 2012 04:13:32PM -1 points [-]

However, if Alice and Bob play the prisoner's dilemma against each other, the situation is much closer to symmetric. Alice faces a player identical to itself except with the "Alice" comment replaced with "Bob", and Bob faces a player identical to itself except with the "Bob" comment replaced with "Alice". Hopefully, their algorithm would compress this information down to "The other player is identical to me, but has a comment difference in its source code", at which point each player would be in an identical situation.

Why doesn't that happen when dealing with Omega?

Comment author: 25 December 2012 08:01:22PM 0 points [-]

Because if Omega uses Alice's source code, then Alice sees that the source code of the simulation is exactly the same as hers, whereas Bob sees that there is a comment difference, so the situation is not symmetric.

Comment author: 25 December 2012 10:21:11PM 0 points [-]

So why doesn't that happen in the prisoner's dilemma?

Comment author: 25 December 2012 10:47:57PM 0 points [-]

Because Alice sees that Bob's source code is the same as hers except for a comment difference, and Bob sees that Alice's source code is the same as his except for a comment difference, so the situation is symmetric.

Comment author: 09 June 2012 11:24:08AM 1 point [-]

You might want to look at my follow-up article which discusses a strategy like this (among others). It's worth noting that slight variations of the problem remove the opportunity for such "sneaky" strategies.

Comment author: 09 June 2012 08:46:14PM 0 points [-]

Ah, thanks. I had missed that, somehow.

Comment author: 06 June 2012 12:12:51PM *  0 points [-]

In a prisoners dilemma Alice and Bob affect each others outcomes. In the newcomb problem, Alice affects Bobs outcome, but Bob doesn't affect Alices outcome. That's why it's OK for Bob to consider himself different in the second case as long as he knows he is definitely not Alice (because otherwise he might actually be in a simulation) but not OK for him to consider himself different in the prisoners dilemma.

Comment author: 23 May 2012 07:16:17PM 15 points [-]

My sense is that question 6 is a better question to ask than 5. That is, what's important isn't drawing some theoretical distinction between fair and unfair problems, but finding out what problems we and/or our agents will actually face. To the extent that we are ignorant of this now but may know more in the future when we are smarter and more powerful, it argues for not fixing a formal decision theory to determine our future decisions, but instead making sure that we and/or our agents can continue to reason about decision theory the same way we currently can (i.e., via philosophy).

Comment author: 25 December 2012 03:50:19PM *  1 point [-]

Are these really "fair" problems? Is there some intelligible sense in which they are not fair, but Newcomb's problem is fair? It certainly looks like Omega may be "rewarding irrationality" (i.e. giving greater gains to someone who runs an inferior decision theory), but that's exactly the argument that CDT theorists use about Newcomb.

In Newcomb's Problem, Omega determines ahead of time what decision theory you use. In these problems, it selects an arbitrary decision theory ahead of time. As such, for any agent using this preselected decision theory, these problems are variations of Newcomb's problem. For any agent using a different decision theory, the problem is quite different (and simpler.) Thus, whatever agent has had it's decision theory preselected can only perform as well as in a standard Newcomb's problem, while a luckier agent may perform better. In other words, there are equivalent problems where Omega bases its decision on the results of a CDT or EDT output, in which they actually perform worse than TDT does in these problems.

Comment author: 12 June 2012 05:23:40PM *  1 point [-]

These questions seem decidedly UNfair to me.

No, they don't depend on the agent's decision-making algorithm; just on another agent's specific decision-making algorithm skewing results against an agent with an identical algorithm and letting all others reap the benefits of an otherwise non-advantageous situation.

So, a couple of things:

1. While I have not mathematically formulated this, I suspect that absolutely any decision theory can have a similar scenario constructed for it, using another agent / simulation with that specific decision theory as the basis for payoff. Go ahead and prove me wrong by supplying one where that's not the case...

2. It would be far more interesting to see a TDT-defeating question that doesn't have "TDT" (or taboo versions) as part of its phrasing. In general, questions of how a decision theory fares when agents can scan your algorithm and decide to discriminate against that algorithm specifically, are not interesting - because they are losing propositions in any case. When another agent has such profound understanding of how you tick and malice towards that algorithm, you have already lost.

Comment author: 01 June 2012 11:36:34AM 1 point [-]

Intuitively this doesn't feel like a 'fair' problem. A UDT agent would ace the TDT formulation and vice versa. Any TDT agent that found a way of distinguishing between 'themselves' and Omega's TDT agent would also ace the problem. It feels like an acausal version of something like:

"I get agents A and B to choose one or two boxes. I then determine the contents of the boxes based on my best guess of A's choice. Surprisingly, B succeeds much better than A at this."

Still an intriguing problem, though.

Comment author: 31 May 2012 11:28:35AM *  2 points [-]

I think we need a 'non-problematic problems for CDT' thread.

For example, it is not problematic for CDT-based robot controller to have the control values in the action A represent multiple servos in it's world model, as if you wired multiple robot arms to 1 controller in parallel. You may want to do this if you want the robot arms move in unison and pass along the balls in the real world imitation of http://blueballmachine2.ytmnd.com/

It is likewise not problematic if you ran out of wire and decided to make the '1 controller' be physically 2 controllers running identical code from above, or if you ran out of time machines and decided to control yesterday's servo with 1 controller yesterday, and today's servo with same controller in same state today. It's simply low level, irrelevant details.

Mathematical formalization of CDT (such as robot software) will one-box or two-box in newcomb depending to the world model within which CDT decides. If the world model has the 'prediction' as second servo represented by same variable, then it'll one-box.

Philosophical maxims like "act based on consequences of my actions", whenever they one box, or two box, depend in turn solely on philosophical questions like "what is self" . E.g. if "self" means the physical meat, then two-box, if "self" means the algorithm (a higher level concept), then one-box if you assume that the thing in predictor is "self" too.

edit: another thing. Stuff outside robot's senses is naturally uncertain. Upon hearing of the explanation in Newcomb's paradox, one has to update the estimates of what is outside the senses; outside might be that the money are fake, and there's some external logic and wiring and servos that will put real million into a box if you choose to 1-box. If the money are to pay for, I dunno, your child's education, clearly one got to 1-box. I'm pretty sure Causal Deciding General Thud can 1-box just fine, if he needs the money to buy the real weapons for the real army, and suspects that outside his senses there may be the predictor spying. General Thud knows that the best option is to 1-box inside predictor and 2-box outside. The goal is never to two box outside the predictor.

Comment author: 28 May 2012 09:08:53AM 5 points [-]

Omega (who experience has shown is always truthful) presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT.

If he's always truthful, then he didn't lie to the simulation either and this means that he did infinitely many simulations before that. So assume he says "Either before you entered the room I ran a simulation of this problem as presented to an agent running TDT, or you are such a simulation yourself and I'm going to present this problem to the real you afterwards", or something similar. If he says different things to you and to your simulation instead, then it's not obvious you'll give the same answer.

Are these really "fair" problems? Is there some intelligible sense in which they are not fair, but Newcomb's problem is fair?

Well, a TDT agent has indexical uncertainty about whether or not they're in the simulation, whereas a CDT or EDT agent doesn't. But I haven't thought this through yet, so it might turn out to be irrelevant.

Comment author: 28 May 2012 09:10:14PM *  0 points [-]

So assume he says "Either before you entered the room I ran a simulation of this problem as presented to an agent running TDT, or you are such a simulation yourself and I'm going to present this problem to the real you afterwards", or something similar.

...

Well, a TDT agent has indexical uncertainty about whether or not they're in the simulation, whereas a CDT or EDT agent doesn't.

Say, you have CDT agent in the world, affecting the world via set of robotic hands, robotic voice, and so on. If you wire up two robot bodies to 1 computer (in parallel so that all movements are done by both bodies), that is just somewhat peculiar robotic manipulator. Handling this doesn't require any changes to CDT.

Likewise when you have two robot bodies controlled by identical mathematical equation, provided that your world model in the CDT utility calculation accounts for all the known manipulators which are controlled by the chosen action, you get correct result.

Likewise, you can have CDT control a multitude of robots, either from one computer, or from multiple computers that independently determine optimal, identical actions (but each computer only act on a robot body assigned to that computer)

The CDT is formally defined using mathematics; the mathematics is already 'timeless', and the fact that the chosen action affects the contents of the boxes is a part of world model not decision theory (and so is the physical time and physical causality a part of world model not the decision theory. Even though the decision theory is called causal, that's some other 'causal').

Comment author: 28 May 2012 06:57:02PM 1 point [-]

This question of "Does Omega lie to sims?" was already discussed earlier in the thread. There were several possible answers from cousin_it and myself, any of which will do.

Comment author: 25 December 2012 03:54:57PM 0 points [-]

I assumed the sims weren't conscious - they were abstract implementations of TDT.

Comment author: 25 December 2012 05:59:29PM 0 points [-]

Well, then there's stuff you know and the sims don't, which you could take in account when deciding and thence decide something different from what they did.

Comment author: 25 December 2012 10:26:34PM 2 points [-]

What stuff? The color of the walls? Memories of your childhood? Unless you have information that alters your decision or you're not a perfect implementer of TDT, in which case you get lumped into the category of "CDT, EDT etc."

Comment author: 25 December 2012 11:47:36PM *  1 point [-]

The fact that you're not a sim, and unlike the sims you'll actually be given the money.

Comment author: 26 December 2012 01:38:30AM *  0 points [-]

Why the hell would Omega program the sim not to value the simulated reward? It's almost certainly just abstract utility anyway.

Comment author: 28 May 2012 03:01:53PM *  0 points [-]

He can't have done literally infinitely many simulations. If that is really required it would be a way out by saying the thought experiment stipulates an impossible situation. I haven't yet considered whether the problem can be changed to give the same result and not require infinitely many simulations.

ETA: no wait, that can't be right, because it would apply to the original Newcomb's problem too. So there must be a way to formalize this correctly. I'll have to look it up but don't have the time right now.

Comment author: 28 May 2012 04:03:14PM 1 point [-]

In the original Newcomb's problem it's not specified that Omega performs simulations -- for all we know, he might use magic, closed timelike curves, or quantum magic whereby Box A is in a superposition of states entangled with your mind whereby if you open Box B, A ends up being empty and if you hand B back to Omega, A ends up being full.

Comment author: 28 May 2012 04:26:18PM 0 points [-]

We should take this seriously: a problem that cannot be instantiated in the physical world should not affect our choice of decision theory.

Before I dig myself in deeper, what does existing wisdom say? What is a practical possible way of implementing Newcomb's problem? For instance, simulation is eminently practical as long as Omega knows enough about the agent being simulated. OTOH, macro quantum enganglement of an arbitrary agent's arbitrary physical instantiation with a box prepared by Omega doesn't sound practical to me, but maybe I'm just swayed by increduilty. What do the experts say? (Including you if you're an expert, obviously.)

Comment author: 28 May 2012 04:37:15PM *  -1 points [-]

cannot

0 is not a probability, and even tiny probabilities can give rise to Pascal's mugging.

Unless your utility function is bounded.

Comment author: 28 May 2012 04:58:12PM 1 point [-]

0 is not a probability, and even tiny probabilities can give rise to Pascal's mugging.

Even? I'd go as far as to say only. Non-tiny probabilities aren't Pascal's muggings. They are just expected utility calculations. </lighthearted nitpick!>

Comment author: 28 May 2012 05:02:37PM 0 points [-]

If a problem statement has an internal logical contradiction, there is still a tiny probability that I and everyone else are getting it wrong, due to corrupted hardware or a common misconception about logic or pure chance, and the problem can still be instantiated. But it's so small that I shouldn't give it preferential consideration over other things I might be wrong about, like the nonexistence of a punishing god or that the food I'm served at the restaraunt today is poisoned.

Either of those if true could trump any other (actual) considerations in my actual utility function. The first would make me obey religious strictures to get to heaven. The second threatens death if I eat the food. But I ignore both due to symmetry in the first case (the way to defeat Pascal's wager in general) and to trusting my estimation of the probability of the danger in the second (ordinary expected utility reasoning).

AFAICS both apply to considering an apparently self-contradictory problem statement as really not possible with effective probability zero. I might be misunderstanding things so much that it really is possible, but I might also be misunderstanding things so much that the book I read yesterday about the history of Africa really contained a fascinating new decision theory I must adopt or be doomed by Omega.

All this seems to me to fail due to standard reasoning about Pascal's mugging. What am I missing?

Comment author: 28 May 2012 06:16:50PM 0 points [-]

If a problem statement has an internal logical contradiction

AFAIK Newcomb's dilemma does not logically contradict itself, it just contradict the physical law that causality cannot go backwards in time.

Comment author: 28 May 2012 06:23:57PM *  1 point [-]

AFAIK Newcomb's dilemma does not logically contradict itself, it just contradict the physical law that causality cannot go backwards in time.

It certainly doesn't contradict itself, and I would also assert that it doesn't contradict the physical law that causality cannot go backwards in time. Instead I would say that giving the sane answer to Newcomb's problem requires abanding the assumption that one's decision must be based only on what it affects based on forward in time causal, physical influence.

Comment author: 28 May 2012 07:46:14PM *  0 points [-]

Consider making both boxes transparent to illustrate some related issue.

Comment author: 25 December 2012 03:57:35PM 0 points [-]

If that is really required it would be a way out by saying the thought experiment stipulates an impossible situation.

This might be better stated as "incoherent", as opposed to mere impossibility which can be resolved with magic.

Comment author: 24 May 2012 03:25:39PM 9 points [-]

Problem 2 reminds me strongly of playing GOPS.

For those who aren't familiar with it, here's a description of the game. Each player receives a complete suit of standard playing cards, ranked Ace low through King high. Another complete suit, the diamonds, is shuffled (or not, if you want a game of complete information) and put face down on the table; these diamonds have point values Ace=1 through King=13. In each trick, one diamond is flipped face-up. Each player then chooses one card from their own hand to bid for the face-up diamonds, and all bids are revealed simultaneously. Whoever bids highest wins the face-up diamonds, but if there is a tie for the highest bid (even when other players did not tie), then no one wins them and they remain on the table to be won along with the next trick. All bids are discarded after every trick.

Especially when the King comes up early, you can see everyone looking at each other trying to figure out how many levels deep to evaluate "What will the other players do?".

(1) Play my King to be likely to win. (2) Everyone else is likely to do (1) also, which will waste their Kings. So instead play low while they throw away their Kings. (3) If the players are paying attention, they might all realize they should (2), in which case I should play highest low card - the Queen. (4+) The 4th+ levels could repeat (2) and (3) mutatis mutandis until every card has been the optimal choice at some level. In practice, players immediately recognize the futility of that line of thought and instead shift to the question: How far down the chain of reasoning are the other players likely to go? And that tends to depend on knowing the people involved and the social context of the game.

Maybe playing GOPS should be added to the repertoire of difficult decision theory puzzles alongside the prisoner's dilemma, Newcomb's problem, Pascal's mugging, and the rest of that whole intriguing panoply. We've had a Prisoner's Dilemma competition here before - would anyone like to host a GOPS competition?

Comment author: 24 May 2012 07:51:17AM *  2 points [-]

Let's say that TDT agents can be divided into two categories, TDT-A and TDT-B, based on a single random bit added to their source code in advance. Then TDT-A can take the strategy of always picking the first box in Problem 2, and TDT-B can always pick the second box.

Now, if you're a TDT agent being offered the problem; with the aforementioned strategy, there's a 50% chance that the simulated agent is different than you, netting you \$1 million. This also narrows down the advantage of the CDT agent - now they only have a 50% chance of winning the money, which is equal to yours.

Comment author: 24 May 2012 07:31:23PM 3 points [-]

Actually, the way the problem is specified, Omega puts the money in box 3.

Comment author: 24 May 2012 12:17:00AM 4 points [-]

Can someone answer the following: Say someone implemented an AGI using CDT. What exactly would go wrong that a better decision theory would fix?

Comment author: 28 May 2012 09:14:09AM 1 point [-]

I think TDT reduces to CDT if there's no other agent with similar or greater intelligence than you around. (You also mustn't have any dynamical inconsistency such as akrasia, otherwise your future and past selves count as ‘other’ as well.) So I don't think it'd make much of a difference for a singleton -- but I'd rather use an RDT just in case.

Comment author: 28 May 2012 02:27:21PM 1 point [-]

I think TDT reduces to CDT if there's no other agent with similar or greater intelligence than you around.

It isn't the absolute level of intelligence that is required, but rather that the other agent is capable of making a specific kind of reasoning. Even this can be relaxed to things that can only dubiously be said to qualify as being classed "agent". The requirement is that some aspect of the environment has (utility-relevant) behavior that is entangled with the output of the decision to be made in a way that is other than a forward in time causal influence. This almost always implies that some agent is involved but that need not necessarily be the case.

Caveat: Maybe TDT is dumber than I remember and artificially limits itself in a way that is relevant here. I'm more comfortable making assertions about what a correct decision theory would do than about what some specific attempt to specify a decision theory would do.

but I'd rather use an RDT just in case.

You make me happy! RDT!

Comment author: 24 May 2012 07:38:35PM 5 points [-]

It will defect on all prisoners dilemmas, even if they're iterated. So, for example, if we'd left it in charge of our nuclear arsenal during the cold war, it would have launched missiles as fast as possible.

But I think the main motivation was that, when given the option to self-modify, a CDT agent will self-modify as a method of precommittment - CDT isn't "reflectively consistent." And so if you want to predict an AI's behavior, if you predict based on CDT with no self-modification you'll get it wrong, since it doesn't stay CDT. Instead, you should try to find out what the AI wants to self-modify to, and predict based on that.

Comment author: 29 May 2012 09:37:18AM *  1 point [-]

It will defect on all prisoners dilemmas, even if they're iterated. So, for example, if we'd left it in charge of our nuclear arsenal during the cold war, it would have launched missiles as fast as possible.

I don't think MAD is a prisoner dilemma: in the prisoner dilemma, if I know you're going to cooperate no matter what, I'm better off defecting, and if I know you're going to defect no matter what, I'm better off defecting. This doesn't seem to be the case here: bombing you doesn't make me better off all things being equal, it just makes you worse off. If anything, it's a game of Chicken where bombing the opponent corresponds to going straight and not bombing them corresponds to swerving. And CDTists don't always go straight in Chicken, do they?

Comment author: 29 May 2012 11:19:15AM 0 points [-]

Hm, I disagree - if nuking the Great Enemy never made you any better off, why was anyone ever afraid of anyone getting nuked in the first place? It might not grow your crops for you or buy you a TV, but gains in security and world power are probably enough incentive to at least make people worry.

Comment author: 29 May 2012 11:24:08AM *  1 point [-]

Still better modelled by Chicken (where the utility of winning is assumed to be much smaller than the negative of the utility of dying, but still non-zero) than by PD.

Comment author: 30 May 2012 05:00:37AM 0 points [-]

I don't understand what you mean by "modeled better by chicken" here.

Comment author: 30 May 2012 05:48:16AM *  1 point [-]

I expect army1987's talking about Chicken, the game of machismo in which participants rush headlong at each other in cars or other fast-moving dangerous objects and whoever swerves first loses. The payoff matrix doesn't resemble the Prisoner's Dilemma all that much: there's more than one Nash equilibrium, and by far the worst outcome from either player's perspective occurs when both players play the move analogous to defection (i.e. don't swerve). It's probably most interesting as a vehicle for examining precommitment tactics.

Comment author: 30 May 2012 10:22:06AM 0 points [-]

I was. I should have linked to it, and I have now.

Comment author: 25 May 2012 11:21:12AM 3 points [-]

A more correct analysis is that CDT defects against itself in iterated Prisoner's Dilemma, provided there is any finite bound to the number of iterations. So two CDTs in charge of nuclear weapons would reason "Hmm, the sun's going to go Red Giant at some point, and even if we escape that, there's still that Heat Death to worry about. Looks like an upper bound to me". And then they'd immediately nuke each other.

A CDT playing against a "RevengeBot" - if you nuke it, it nukes back with an all out strike - would never fire its weapons. But then the RevengeBot could just take out one city at a time, without fear of retaliation.

Since CDT was the "gold standard" of rationality developed during the time of the Cold War, I am somewhat puzzled why we're still here.

Comment author: 26 May 2012 02:31:27AM 1 point [-]

So two CDTs in charge of nuclear weapons would reason "Hmm, the sun's going to go Red Giant at some point, and even if we escape that, there's still that Heat Death to worry about. Looks like an upper bound to me". And then they'd immediately nuke each other.

This assumes that the mutual possession of nuclear weapons constitutes a prisoners dilemma. There isn't necessarily a positive payoff to nuking folks. (You know, unless they are really jerks!)

Comment author: 26 May 2012 06:57:12AM 1 point [-]

Well nuking the other side eliminates the chance that they'll ever nuke you (or will attack with conventional weapons), so there is arguably a slight positive for nuking first as opposed to keeping the peace.

There were some very serious thinkers arguing for a first strike against the Soviet Union immediately after WW2, including (on some readings) Bertrand Russell, who later became a leader of CND. And a pure CDT (with selfish utility) would have done so. I don't see how Schelling theory could have modified that... just push the other guy over the cliff before the ankle-chains get fastened.

Probably the reason it didn't happen was the rather obvious "we don't want to go down in history as even worse than the Nazis" - also there was complacency about how far behind the Soviets actually were. If it had been known that they would explode an A-bomb as little as 4 years after the war, then the calculation would have been different. (Last ditch talks to ban nuclear weapons completely and verifiably - by thorough spying on each other - or bombs away. More likely bombs away I think.)

Comment author: 25 May 2012 11:30:47AM 2 points [-]

Well, it's good that you're puzzled, because it wasn't - see Schelling's "The Strategy of Conflict."

Comment author: 23 May 2012 07:01:14PM 9 points [-]

The problems look like a kind of an anti-Prisoner's Dilemma. An agent plays against an opponent, and gets a reward iff they played differently. Then any agent playing against itself is screwed.

Comment author: 23 May 2012 05:42:51PM -1 points [-]

Omega (who experience has shown is always truthful) presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT.

There seems to be a contradiction here. If Omega siad this to me I would either have to believe omega just presented evidence of being untruthful some of the time.

If Omega simulated the problem at hand then in said simulation Omega must have siad: "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT." In the first simulation the statement is a lie.

Problem 2 has a similar problem.

It is not obvious that the problem can be reformulated to keep Omega constantly truthfully and still have CDT or EDT come out ahead of TDT.

Comment author: 23 May 2012 06:57:44PM *  3 points [-]

Your difficulty seems to be with the parenthesis "(who experience has shown is always truthful)". The relevant experience here is going to be derived from real-world subjects who have been in Omega problems, exactly as is assumed for the standard Newcomb problem. It's not obvious that Omega always tells the truth to its simulations; no-one in the outside world has experience of that.

However you can construe the problem so that Omega doesn't have to lie, even to sims. Omega could always prefix its description of the problem with a little disclaimer "You may be one of my simulations. But if not, then...".

Or Omega could simulate a TDT agent making decisions as if it had just been given the problem description verbally by Omega, without Omega actually doing so. (Whether that's possible or not depends a bit on the simulation).

Comment author: 23 May 2012 05:57:50PM *  1 point [-]

Omega could truthfully say "the contents of the boxes are exactly as if I'd presented this problem to an agent running TDT".

Comment author: 23 May 2012 06:41:41PM 0 points [-]

I do not know if Omega can say that truthfully because I do not know weather the self referential equation representing the problem has a solution.

The problems set out by the OP assumes there is a solution and a particular answer but with out writing out the equation and plugging in his solution to show the solution actually works.

Comment author: 23 May 2012 11:41:30AM *  15 points [-]

Consider Problem 3: Omega presents you with two boxes, one of which contains \$100, and says that it just ran a simulation of you in the present situation and put the money in the box the simulation didn't choose.

This is a standard diagonal construction, where the environment is set up so that you are punished for the actions you choose, and rewarded for those of don't choose, irrespective of the actions. This doesn't depend on the decision algorithm you're implementing. A possible escape strategy is to make yourself unpredictable to the environment. The difficulty would also go away if the thing being predicted wasn't you, but something else you could predict as well (like a different agent that doesn't simulate you).

Comment author: 23 May 2012 11:57:12AM 8 points [-]

The correct solution to this problem is to choose each box with equal probability; this problem is the reason why decision theories have to be non-deterministic. It comes up all the time in real life: I try and guess what safe combination you chose, try that combination, and if it works I take all your money. Or I try to guess what escape route you'll use and post all the guards there.

What's interesting about Problem 2 is that it makes what would be the normal game-theoretic strategy unstable by choosing deterministically where the probabilities are exactly equal.

Comment author: 23 May 2012 12:28:15PM 4 points [-]

this problem is the reason why decision theories have to be non-deterministic. It comes up all the time in real life: I try and guess what safe combination you chose, try that combination, and if it works I take all your money.

Of course, you can just set up the thought experiment with the proviso that "be unpredictable" is not a possible move - in fact that's the whole point of Omega in these sorts of problems. If Omega's trying to break into your safe, he takes your money. In Nesov's problem, if you can't make yourself unpredictable, then you win nothing - it's not even worth your time to open the box. In both cases, a TDT agent does strictly as well as it possibly could - the fact that there's \$100 somewhere in the vicinity doesn't change that.

Comment author: 23 May 2012 11:14:04AM 4 points [-]

There's a different version of these problems for each decision theory, depending on what Omega simulates. For CDT, all agents two-box and all agents get \$1000. However, on problem 2, it seems like CDT doesn't have a well-defined decision at all; the effort to work out what Omega's simulator will say won't terminate.

(I'm spamming this post with comments - sorry!)

Comment author: 23 May 2012 12:16:59PM *  2 points [-]

You raise an interesting question here - what would CDT do if a CDT agent were in the simulation?

It looks to me that CDT just doesn't have the conceptual machinery to handle this problem properly, so I don't really know. One thing that could happen is that the simulated CDT agent tries to simulate itself and gets stuck in an infinite loop. I didn't specify exactly what would happen in that case, but if Omega can prove that the simulated agent is caught in a loop, then it knows the sim will choose each box with probability zero, and so (since these are all equal), it will fill box 1. But now can a real-life CDT agent also work this out, and beat the game by selecting box 1. But if so, why won't the sim do that, and so on? Aargh !!!

Another thought I had is that CDT could try tossing a logical coin, like computing the googleth digit of pi, and if it is even choose box 1, whereas if it is odd, choose box 2. If it runs out of time before computing (which the real-life agent will do), then it just picks box 1 or 2 with equal probability. The simulated CDT agent will however get to the end of the computation (Omega has arbitrary computational resources) and definitely pick 1 or 2 with certainty, so the money is definitely in one of those two boxes, which looks like the probability of the actual agent winning is raised to 50%. TDT might do the same.

However this looks like cheating to me, for both CDT and TDT.

EDIT: On reflection, it seems clear that CDT would never do anything "creatively sneaky" like tossing a logical coin; but it is the sort of approach that TDT (or some variant thereof) might come up with. Though I still think it's cheating.

Comment author: 23 May 2012 03:34:15PM *  1 point [-]

The version of CDT that I described explicitly should arrive at the uniformly random solution. You don't have to be able to simulate a program all the way through, just able to prove things about its output.

EDIT: Wait, this is wrong. It won't be able to consistently derive an answer, because of the way it acts given such an answer, and so it will go with whatever its default Nash equilibrium is.

Comment author: 23 May 2012 03:58:16PM 1 point [-]

Re: your EDIT. Yes, I've had that sort of reaction a couple of times today!

I'm shifting around between "CDT should pick at random, no CDT should pick Box 1, no CDT should use a logical coin, no CDT should pick it's favourite number in the set {1, 2} with probability 1, and hope that the version in the sim has a different favourite number, no, CDT will just go into a loop or collapse in a heap."

I'm also quite clueless how a TDT is supposed to decide if it's told there's a CDT in the sim... This looks like a pretty evil decision problem in its own right.

Comment author: 23 May 2012 06:15:00PM 1 point [-]

Well, the thing is that CDT doesn't completely specify a decision theory. I'm confident now that the specific version of CDT that I described would fail to deduce anything and go with its default, but it's hard to speak for CDTs in general on such a self-referential problem.

Comment author: 23 May 2012 03:34:14PM 2 points [-]

I don't think your "detect infinite resources and cheat" strategy is really worth thinking about. Instead of strategies like CDT and TDT whose applicability to limited compute resources is unclear, suppose you have an anytime strategy X, which you can halt at any time and get a decision. Then there's really a family of algorithms X-t, where t is the time you're going to give it to run. In this case, if you are X-t, we can consider the situation where Omega fields X-t against you.

Comment author: 23 May 2012 11:03:55AM 21 points [-]

I think we could generalise problem 2 to be problematic for any decision theory XDT:

There are 10 boxes, numbered 1 to 10. You may only take one. Omega has (several times) run a simulated XDT agent on this problem. It then put a prize in the box which it determined was least likely to be taken by such an agent - or, in the case of a tie, in the box with the lowest index.

If agent X follows XDT, it has at best a 10% chance of winning. Any sufficiently resourceful YDT agent, however, could run a simulated XDT agent themselves, and figure out what Omega's choice was without getting into an infinite loop.

Therefore, YDT performs better than XDT on this problem.

If I'm right, we may have shown the impossibility of a "best' decision theory, no matter how meta you get (in a close analogy to Godelian incompleteness). If I'm wrong, what have I missed?

Comment author: 30 May 2012 09:27:02PM *  1 point [-]

To draw out the analogy to Godelian incompleteness, any computable decision theory is subject to the suggested attack of being given a "Godel problem'' like problem 1, just as any computable set of axioms for arithmetic has a Godel sentence. You can always make a new decision theory TDT' that is TDT+ do the right thing for the Godel problem. But TDT' has it's own Godel problem of course. You can't make a computable theory that says "do the right thing for all Godel probems", if you try to do that it would not give you something computable. I'm sure this is all just restating what you had in mind, but I think it's worth spelling out.

If you have some sort of oracle for the halting problem (i.e. a hypercomputer) and Omega doesn't, he couldn't simulate you, so you would presumably be able to always win fair problems. Otherwise the best thing you could hope for is to get the right answer whenever your computation halts, but fail to halt in your computation for some problems, such as your Godel problem. (A decision theory like this can still be given a Godel problem if Omega can solve the halting problem, "I simulated you and if you fail to halt on this problem..."). I wonder if TDT fails to halt for its Godel problem, or if some natural modification of it might have this property, but I don't understand it well enough to guess.

I am less optimistic about revising "fair" to exclude Godel problems. The analogy would be proving Peano arithmetic is complete "except for things that are like Godel sentences." I don't know of any formalizations of the idea of "being a Godel sentence".

Comment author: 25 December 2012 04:24:51PM -1 points [-]

If I'm right, we may have shown the impossibility of a "best' decision theory, no matter how meta you get (in a close analogy to Godelian incompleteness). If I'm wrong, what have I missed?

You're right. However, since all decision theories fail when confronted with their personal version of this problem, but may or may not fail in other problems, then some decision theories may be better than others. The one that is better than all the others is thus the "best" DT.

Comment author: 23 May 2012 11:28:34PM *  3 points [-]

If I'm right, we may have shown the impossibility of a "best' decision theory, no matter how meta you get (in a close analogy to Godelian incompleteness). If I'm wrong, what have I missed?

I would say that any such problem doesn't show that there is no best decision theory, it shows that that class of problem cannot be used in the ranking.

Edited to add: Unless, perhaps, one can show that an instantiation of the problem with particular choice of (in this case decision theory, but whatever is varied) is particularly likely to be encountered.

Comment author: 23 May 2012 11:33:55AM *  9 points [-]

You're right about problem 2 being a fully general counterargument, but your philosophical conclusion seems to be stopping too early. For example, can we define a class of "fair" problems that excludes problem 2?

Comment author: 23 May 2012 10:36:28PM 1 point [-]

It looks like the issue here is that while Omega is ostensibly not taking into account your decision theory, it implicitly is by simulating an XDT agent. So a first patch would be to define simulations of a specific decision theory (as opposed to simulations of a given agent) as "unfair".

On the other hand, we can't necessarily know if a given computation is effectively equivalent to simulating a given decision theory. Even if the string "TDT" is never encoded anywhere in Omega's super-neurons, it might still be simulating a TDT agent, for example.

On the first hand again, it might be easy for most problems to figure out whether anyone is implicitly favouring one DT over another, and thus whether they're "fair".

Comment author: 23 May 2012 12:11:09PM 2 points [-]

One possible place to look is that we're allowing Omega access not just to a particular simulated decision of TDT, but to the probabilities with which it makes these decisions. If we force it to simulate TDT many times and sample to learn what the probabilities are, it can't detect the exact balance for which it does deterministic symmetry breaking, and the problem goes away.

This solution occurred to me because this forces Omega to have something like a continuous behaviour response to changes in the probabilities of different TDT outputs, and it seems possible given that to imagine a proof that a fixed point must exist.

Comment author: 23 May 2012 08:57:03AM 9 points [-]

I think it's right to say that these aren't really "fair" problems, but they are unfair in a very interesting new way that Eliezer's definition of fairness doesn't cover, and it's not at all clear that it's possible to come up with a nice new definition that avoids this class of problem. They remind me of "Lucas cannot consistently assert this sentence".

Comment author: 23 May 2012 08:45:05AM *  6 points [-]

Thanks for the post! Your problems look a little similar to Wei's 2TDT-1CDT, but much simpler. Not sure about the other decision theory folks, but I'm quite puzzled by these problems and don't see any good answer yet.

Comment author: 23 May 2012 05:11:22PM *  1 point [-]

I've looked a bit at that thread, and the related follow-ups, and my head is now really spinning. You are correct that my problems were simpler!

My immediate best guess on 2TDT-1CDT is that the human player would do better to submit a simple defect-bot (rather than either CDT or TDT), and this is irrespective of whether the player themselves is running TDT or CDT. If the player has to submit his/her own decision algorithm (source-code) instead of a bot, then we get into a colossal tangle about "who defects first", "whose decision is logically prior to whose" and whether the TDT agents will threaten to defect if they detect that the submitted agent may defect, or has already self-modified into unconditionally defecting, or if the TDT agents will just defect unconditionally anyway to even the score (e.g. through some form of utility trading / long term consequentialism principle that TDT has to beat CDT in the long run, therefore it had better just get on and beat CDT wherever possible...)

In short, I observe I am confused.

With all this logical priority vs temporal priority, and long term consequences feeding into short-term utilities, I'm reminded of the following from HPMOR Chapter 61:

There was a narrowly circulated proverb to the effect that only one Auror in thirty was qualified to investigate cases involving Time-Turners; and that of those few, the half who weren't already insane, soon would be.

Comment author: [deleted] 28 December 2012 08:38:33PM 0 points [-]

1) Not to my knowledge. 2) No, you reasoned TDT's decisions correctly. 3) A TDT agent would not self-modify to CDT, because if it did, its simulation would also self-modify to CDT and then two-box, yielding only \$1000 for the real TDT agent. 4) TDT does seem to be a single algorithm, albeit a recursive one in the presense of other TDT agents or simulations. TDT doesn't have to look into its own code, nor does it change its mind upon seeing it, for it decides as if deciding what the code outputs. 5) This is a bit of a tricky one. You could say it's fair if you judge by whether each agent did the best it could have done, rather than getting the most, but a CDT agent could say the same when it two-boxes and reasons it would have gotten \$0 if it had one-boxed. I guess in a timeless sense, TDT does the best it could have done in these problems, while CDT doesn't do the best it could have done in newcomb's problem. 6) That's a tough one. If you're asking what omega's intentions are (or would be in the real world), I have no idea. If you're asking who succeeds at the majority of problems in the problem space of anything omega can ask, I strongly believe TDT would outperform CDT on it.

Comment author: 09 June 2012 12:39:32AM 0 points [-]

Generalization of Newcomb's Problem: Omega predicts your behavior with accuracy p.

This one could actually be experimentally tested, at least for certain values of p; so for instance we could run undergrads (with \$10 and \$100 instead of \$1,000 and \$1,000,000; don't bankrupt the university) and use their behavior from the pilot experiment to predict their behavior in later experiments.

Comment author: 05 June 2012 07:08:54PM *  0 points [-]

Why is the discrimination problem "unfair"? It seems like in any situation where decision theories are actually put into practice, that type of reasoning is likely to be popular. In fact I thought the whole point of advanced decision theories was to deal with that sort of self-referencing reasoning. Am I misunderstanding something?

Comment author: 28 December 2012 09:10:04PM *  0 points [-]

If you are a TDT agent, you don't know whether you're the simulation or the "outside decision", since they're effectively the same. Or rather, the simulation will have made the same choice that you will make.

If you're not a TDT agent, you gain more information: You're not a TDT agent, and the problem states TDT was simulated.

So the discrimination problem functionally resolves to:

If you are a TDT agent, have some dirt. End of story.
If you are not a TDT agent, I have done some mumbo-jumbo, and now you can either take one box for \$1000 or \$1m, or both of them for \$1001000. Have fun! (the mumbo-jumbo has nothing to do with you anyway!)

Comment author: 04 June 2012 04:18:07AM 0 points [-]

Is the trick with problem 1 that what you are really doing, by using a simulation, is having an agent use timeless decision theory in a context where they can't use timeless decision theory? The simulated agent doesn't know about the external agent. Or, you could say, it's impossible for it to be timeless; the directionality of time (simulation first, external agent moves second) is enforced in a way that makes it impossible for the simulated agent to reason across that time barrier. Therefore it's not fair to call what it decides "timeless decision theory".

Comment author: 31 May 2012 08:49:30AM *  0 points [-]

Either problem 1 and 2 are hitting an infinite regress issue, or I don't see why an ordinary TDT agent wouldn't 2box, and choose the first box, respectively. There's a difference between the following problems:

• I, Omega, predicted that you would do such and such, and acted accordingly.
• I, Omega, simulated another agent, and acted accordingly.
• I, Omega, simulated this very problem, only if you don't run TDT that's not the same problem, but I promise it's the same nonetheless, and acted accordingly

Now, in problem 1 and 2, are the simulated problem and the actual problem actually the same? If they are, I see an infinite regress at Omega's side, and therefore not a problem one would ever encounter. If they aren't, then what I actually understand them to be is:

1. Omega presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of Newcomb's problem as presented to an agent running TDT. If the agent two-boxed then I put nothing in Box B, whereas if the agent one-boxed then I put \$1 million in Box B. Regardless of how the simulated agent decided, I put \$1000 in Box A. Now please choose your box or boxes."

Really, You don't have to use something else than TDT to see that the simulated TDT agent one boxed. Its problem isn't your problem. Your precomittment to your problem doesn't affect your precommitment to its problem. Of course, the simulated TDT agent did the right choice by 1 boxing. But you should 2 box.

2. Omega now presents ten boxes, numbered from 1 to 10, and announces the following. "I ran multiple simulation of the following problem, presented to a TDT agent: “You must take exactly one box. I determined which box you are least likely to take, and put \$1million in that box. If there is a tie, I put the money in one of them (the one labelled with the lowest number).” I put the money in the box the simulated TDT agent were least likely to choose. If there was a tie, I put the money in one of them (the one labelled with the lowest number). Now choose your box."

Same here. You know that the TDT agent put equal probability on every box, to maximize its gains. Again, its problem isn't your problem. Your precomittment to your problem doesn't affect your precommitment to its problem. Of course, the simulated TDT agent did the right choice by choosing at random. But you should take box 1.

Comment author: 19 June 2012 01:55:33PM 0 points [-]

You don't have to use something else than TDT to see that the simulated TDT agent one boxed. Its problem isn't your problem.

This is CDT reasoning, AKA causal reasoning. Or in other words, how do you not use the same reasoning in the original Newcombe problem?

Comment author: 21 June 2012 10:16:16PM *  -1 points [-]

The reasoning is different because the problem is different.

The simulated agent and yourself were not subjected to the same problem. Therefore you can perfectly precommit to different decisions. TDT does not automatically take the same decisions to problems that merely kinda look the same. They have to actually be the same. There may be specific reasons why TDT would make the same decision, but I doubt it.

Now on to the examples:

### Newcomb's problem

Omega ran a simulation of Newcomb's problem, complete with a TDT agent in it. The simulated TDT agent obviously one boxed, and got the million. If you run TDT yourself, you also know it. Now, Omega tells you of this simulation, and tells you to chose your boxes. This is not Newcomb's problem. If it was, deciding to 2 box would cause box B to be empty!

CDT would crudely assume that 2 boxing gets it \$1000 more than 1 boxing. TDT on the other hand knows the simulated box B (and therefore the real one as well) has the million, regardless of its current decision.

### 10 boxes problem

Again, the simulated problem and the real one aren't the same. If there were, choosing box 1 with probability 1 would cause box 2 to have the million. Because it's not the same problem, even TDT should be allowed to precommit different decision. The point of TDT is to foresee the consequences of its precommitments. It will therefore know that its precommitment in the real problem doesn't have any influence to its precommitment (and therefore the outcome) in the simulated one. This lack of influence allows it to fall back on CDT reasoning.

Makes sense?

Comment author: 25 December 2012 03:53:04PM *  0 points [-]

The simulated agent and yourself were not subjected to the same problem.

Um, yes, they were. That's the whole point.

Comment author: 31 December 2012 07:40:07PM 0 points [-]

I'll need to write a full discussion post about that at some point. There is one crucial difference besides "I'm TDT" and "I'm CDT". It's "The simulated agent uses the same decision theory" and "The simulated agent does not use the same decision theory".

That's not exactly the same problem, and I think that is the whole point.

Comment author: 22 June 2012 12:54:50AM 0 points [-]

The simulated problem and the actual problem don't have to actually be the same - just indistinguishable from the point of view of the agent.

Omega avoids infinite regress because the actual contents of the boxes are irrelevant for the purposes of the simulation, so no sub-simulation is necessary.

Comment author: 22 June 2012 09:15:47AM 0 points [-]

Okay. So, what specific mistake TDT does that would prevent it to distinguish the two problems? What does it lead it to think "If I precommit X in problem 1, I have to precommit X in problem 2 as well".

(If the problems aren't the same, of course Omega can avoid infinite regress. And if there is unbounded regress, we may be able to find a non-infinite solution by looping the regress over itself. But then the problems (simulated an real) are definitely the same.)

Comment author: 22 June 2012 11:51:57AM 0 points [-]

In the simulated problem the simulated agent is presented with the choice but never gets the reward; for all it matters both boxes can be empty. This means that Omega doesn't have to do another simulation to work out what's in the simulated boxes.

The infinite regress is resolvable anyway - since each TDT agent is facing the exact same problem, their decisions must be identical, hence TDT one-boxes and Omega knows this.

Comment author: 22 June 2012 12:20:19PM 0 points [-]

The infinite regress is resolvable anyway - since each TDT agent is facing the exact same problem, their decisions must be identical, hence TDT one-boxes and Omega knows this.

Now there's still the question of the perceived difference between the simulated problem and the real one (I assume here that you should 1 box in the simulation, and 2 box in the real problem). There is a difference, how come TDT does not see it? A Rational Decision Theory would —we humans do. Or if it can see it, how come can't it act on it? RDT could. Do you concede that TDT does and can, or do you still have doubts?

Comment author: 23 June 2012 12:19:36AM 1 point [-]

Due to how the problem is set up, you can't notice the difference until after you've made your decision. The only reason other decision theories know they're not in the simulation is because the problem explicitly states that a TDT agent is simulated, which means it can't be them.

Comment author: 24 June 2012 08:04:56PM *  0 points [-]

The only reason other decision theories know they're not in the simulation is because the problem explicitly states that a TDT agent is simulated, which means it can't be them.

That's false. Here is a modified version of the problem:

Omega presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of Newcomb's problem as presented to you. If your simulated twin 2-boxed then I put nothing in Box B. If your simulated twin 1-boxed, I put \$1 million in Box B. In any case, I put \$1000 in Box A. Now please 1-box or 2-box."

Even if you're not running TDT, the simulated agent is running the same decision algorithm as you are. If that was the reason why TDT couldn't tell the difference, well, now no one can. However you and I can make the difference. The simulated problem is obviously different:

Omega presents the usual two boxes A and B and announces the following. "I am subjecting you to Newcomb's problem. Now please 1-box or 2-box".

Really, the subjective difference between the two problems should be obvious to any remotely rational agent.

(Please let me know if you agree up until that point. Below, I assume you do.)

I'm pretty sure the correct answers for the two problems (my modified version as well as the original one) are 1-box in the simulation, 2-box in the real problem. (Do you still agree?)

So. We both agree that RDT (Rational Decision Theory) 1-boxes in the simulation, and 2-boxes in the real problem. CDT would 2-box in both, and TDT would 1-box in the simulation while in the real problem it would…

• 2-box? I think so.
• 1-box? Supposedly because it can't tell simulation from reality. Or rather, it can't tell the difference between Newcomb's problem and the actual problem. Even though RDT does. (riiight?) So again, I must ask, why not? I need a more specific answer than "due to how the problem is set up". I need you to tell me what specific kind of irrationality TDT is committing here. I need to know its specific blind spot.
Comment author: 24 June 2012 11:04:10PM *  1 point [-]

In your problem, TDT does indeed 2-box, but it's quite a different problem from the original one. Here's the main difference:

I ran a simulation of this problem

vs

I ran a simulation of Newcomb's problem

Comment author: 24 June 2012 08:59:33PM 0 points [-]

Well, in the problem you present here TDT would 2-box, but you've avoided the hard part of the problem from the OP, in which there is no way to tell whether you're in the simulation or not (or at least there is no way for the simulated you to tell), unless you're running some algorithm other than TDT.