Problematic Problems for TDT

36 Post author: drnickbone 29 May 2012 03:41PM

A key goal of Less Wrong's "advanced" decision theories (like TDT, UDT and ADT) is that they should out-perform standard decision theories (such as CDT) in contexts where another agent has access to the decider's code, or can otherwise predict the decider's behaviour. In particular, agents who run these theories will one-box on Newcomb's problem, and so generally make more money than agents which two-box. Slightly surprisingly, they may well continue to one-box even if the boxes are transparent, and even if the predictor Omega makes occasional errors (a problem due to Gary Drescher, which Eliezer has described as equivalent to "counterfactual mugging"). More generally, these agents behave like a CDT agent will wish it had pre-committed itself to behaving before being faced with the problem.

However, I've recently thought of a class of Omega problems where TDT (and related theories) appears to under-perform compared to CDT. Importantly, these are problems which are "fair" - at least as fair as the original Newcomb problem - because the reward is a function of the agent's actual choices in the problem (namely which box or boxes get picked) and independent of the method that the agent uses to choose, or of its choices on any other problems. This contrasts with clearly "unfair" problems like the following:

Discrimination: Omega presents the usual two boxes. Box A always contains $1000. Box B contains nothing if Omega detects that the agent is running TDT; otherwise it contains $1 million.

 

So what are some fair "problematic problems"?

Problem 1: Omega (who experience has shown is always truthful) presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT. I won't tell you what the agent decided, but I will tell you that if the agent two-boxed then I put nothing in Box B, whereas if the agent one-boxed then I put $1 million in Box B. Regardless of how the simulated agent decided, I put $1000 in Box A. Now please choose your box or boxes."

Analysis: Any agent who is themselves running TDT will reason as in the standard Newcomb problem. They'll prove that their decision is linked to the simulated agent's, so that if they two-box they'll only win $1000, whereas if they one-box they will win $1 million. So the agent will choose to one-box and win $1 million.

However, any CDT agent can just take both boxes and win $1001000. In fact, any other agent who is not running TDT (e.g. an EDT agent) will be able to re-construct the chain of logic and reason that the simulation one-boxed and so box B contains the $1 million. So any other agent can safely two-box as well. 

Note that we can modify the contents of Box A so that it contains anything up to $1 million; the CDT agent (or EDT agent) can in principle win up to twice as much as the TDT agent.

 

Problem 2: Our ever-reliable Omega now presents ten boxes, numbered from 1 to 10, and announces the following. "Exactly one of these boxes contains $1 million; the others contain nothing. You must take exactly one box to win the money; if you try to take more than one, then you won't be allowed to keep any winnings. Before you entered the room, I ran multiple simulations of this problem as presented to an agent running TDT, and determined the box which the agent was least likely to take. If there were several such boxes tied for equal-lowest probability, then I just selected one of them, the one labelled with the smallest number. I then placed $1 million in the selected box. Please choose your box."

Analysis: A TDT agent will reason that whatever it does, it cannot have more than 10% chance of winning the $1 million. In fact, the TDT agent's best reply is to pick each box with equal probability; after Omega calculates this, it will place the $1 million under box number 1 and the TDT agent has exactly 10% chance of winning it.
 
But any non-TDT agent (e.g. CDT or EDT) can reason this through as well, and just pick box number 1, so winning $1 million. By increasing the number of boxes, we can ensure that TDT has arbitrarily low chance of winning, compared to CDT which always wins.


Some questions:

1. Have these or similar problems already been discovered by TDT (or UDT) theorists, and if so, is there a known solution? I had a search on Less Wrong but couldn't find anything obviously like them.

2. Is the analysis correct, or is there some subtle reason why a TDT (or UDT) agent would choose differently from described?

3. If a TDT agent believed (or had reason to believe) that Omega was going to present it with such problems, then wouldn't it want to self-modify to CDT? But this seems paradoxical, since the whole idea of a TDT agent is that it doesn't have to self-modify.

4. Might such problems show that there cannot be a single TDT algorithm (or family of provably-linked TDT algorithms) so that when Omega says it is simulating a TDT agent, it is quite ambiguous what it is doing? (This objection would go away if Omega revealed the source-code of its simulated agent, and the source-code of the choosing agent; each particular version of TDT would then be out-performed on a specific matching problem.)

5. Are these really "fair" problems? Is there some intelligible sense in which they are not fair, but Newcomb's problem is fair? It certainly looks like Omega may be "rewarding irrationality" (i.e. giving greater gains to someone who runs an inferior decision theory), but that's exactly the argument that CDT theorists use about Newcomb.

6. Finally, is it more likely that Omegas - or things like them - will present agents with Newcomb and Prisoner's Dilemma problems (on which TDT succeeds) rather than problematic problems (on which it fails)?

 

Edit: I tweaked the explanation of Box A's contents in Problem 1, since this was causing some confusion. The idea is that, as in the usual Newcomb problem, Box A always contains $1000. Note that Box B depends on what the simulated agent chooses; it doesn't depend on Omega predicting what the actual deciding agent chooses (so Omega doesn't put less money in any box just because it sees that the actual decider is running TDT).

Comments (298)

Comment author: shokwave 23 May 2012 07:52:54AM *  3 points [-]

Problems 1 and 2 both look - to me - like fancy versions of the Discrimination problem. edit: I am much less sure of this. That is, Omega changes the world based on whether the agent implements TDT. This bit I am still sure of, but it might be the case that TDT can overcome this anyway.

Discrimination problem: Money Omega puts in room if you're TDT = $1,000. Money Omega puts in room if you're not = $1,001,000.

Problem 1: Money Omega puts in room if you're TDT = $1,000 or $1,001,000. Edit: made a mistake. The error in this problem may be subtler than I first claimed. Money Omega puts in room if you're not = $1,001,000.

Problem 2: $1,000,000 either way. This problem is different but also uninteresting. Due to Omega caring about TDT again, it is just the smallest interesting number paradox for TDT agents only. Other decision theories get a free ride because you're just asking them to reason about an algorithm (easy to show it produces a uniform distribution) and then a maths question (which box has the smallest number on it?).

You claim the rewards are

independent of the method that the agent uses to choose

but they're not. They depend on whether the agent uses TDT to choose or not.

Comment author: aleksiL 23 May 2012 08:11:35AM 2 points [-]

Agree. You use process X to determine the setup and agents instantiating X are going to be constrained. Any decision theory would be at a disadvantage when singled out like this.

Comment author: cousin_it 23 May 2012 08:29:27AM *  0 points [-]

Problem 1: Money Omega puts in room if you're TDT = $1,000 or $1,000,000.

Sorry, shouldn't it be "$1,000 or $1,001,000"?

Comment author: shokwave 23 May 2012 10:06:39AM 0 points [-]

Right, but $1,001,000 only in the case where you restrict yourself to picking $1,000,000. I oversimplified and it might not actually be accurate.

Comment author: drnickbone 23 May 2012 08:51:37AM 2 points [-]

I've edited the problem statement to clarify Box A slightly. Basically, Omega will put $1001000 in the room ($1000 for box A and $1 million for Box B) regardless of the algorithm run by the actual deciding agent. The contents of the boxes depend only on what the simulated agent decides.

Comment author: nekomata 23 May 2012 08:19:53AM 0 points [-]

I don't understand the special role of box 1 in Problem 2. It seems to me that if Omega just makes different choices for the box in which to put the money, all decision theories will say "pick one at random" and will be equal.

In fact, the only reason I can see why Omega picks box 1 seems to be that the "pick at random" process of your TDT is exactly "pick the first one". Just replace it with something dependant on its internal clock (or any parameter not known at the time when Omega asks its question) and the problem disappears.

Comment author: drnickbone 23 May 2012 09:00:11AM *  1 point [-]

Omega's choice of box depends on its assessment of the simulated agent's choosing probabilities. The tie-breaking rule (if there are several boxes with equal lowest choosing probability, then select the one with the lowest label) is to an extent arbitrary, but it is important that there is some deterministic tie-breaking rule.

I also agree this is entirely a maths problem for Omega or for anyone whose decisions aren't entangled with the problem (with a proof that Box 1 will contain the $1 million). The difficulty is that a TDT agent can't treat it as a straight maths problem which is unlinked to its own decisions.

Comment author: nekomata 24 May 2012 03:19:31PM 1 point [-]

Why is it important that there is a deterministic breaking rule ? When you would like random numbers, isn't it always better to have a distribution as close as random as possible, even if it is pseudo-random ?

That question is perhaps stupid, I have the impression that I am missing something important...

Comment author: drnickbone 25 May 2012 11:31:36AM 1 point [-]

Remember it is Omega implementing the tie-breaker rule, since it defines the problem.

The consequence of the tie-breaker is that the choosing agent knows that Omega's box-choice was a simple deterministic function of a mathematical calculation (or a proof). So the agent's uncertainty about which box contains the money is pure logical uncertainty.

Comment author: nekomata 25 May 2012 12:03:49PM 0 points [-]

Whoops... I can't believe I missed that. You are obviously right.

Comment author: cousin_it 23 May 2012 08:45:05AM *  6 points [-]

Thanks for the post! Your problems look a little similar to Wei's 2TDT-1CDT, but much simpler. Not sure about the other decision theory folks, but I'm quite puzzled by these problems and don't see any good answer yet.

Comment author: drnickbone 23 May 2012 08:53:45AM 0 points [-]

Thanks for this, and for the reference. I'll have a look at 2TDT-1CDT to see if there are any insights there which could resolve these problems. I've got a couple of ideas myself, but will check up on the other work.

Comment author: orthonormal 23 May 2012 03:28:40PM 0 points [-]
Comment author: drnickbone 23 May 2012 05:11:22PM *  1 point [-]

I've looked a bit at that thread, and the related follow-ups, and my head is now really spinning. You are correct that my problems were simpler!

My immediate best guess on 2TDT-1CDT is that the human player would do better to submit a simple defect-bot (rather than either CDT or TDT), and this is irrespective of whether the player themselves is running TDT or CDT. If the player has to submit his/her own decision algorithm (source-code) instead of a bot, then we get into a colossal tangle about "who defects first", "whose decision is logically prior to whose" and whether the TDT agents will threaten to defect if they detect that the submitted agent may defect, or has already self-modified into unconditionally defecting, or if the TDT agents will just defect unconditionally anyway to even the score (e.g. through some form of utility trading / long term consequentialism principle that TDT has to beat CDT in the long run, therefore it had better just get on and beat CDT wherever possible...)

In short, I observe I am confused.

With all this logical priority vs temporal priority, and long term consequences feeding into short-term utilities, I'm reminded of the following from HPMOR Chapter 61:

There was a narrowly circulated proverb to the effect that only one Auror in thirty was qualified to investigate cases involving Time-Turners; and that of those few, the half who weren't already insane, soon would be.

Comment author: ciphergoth 23 May 2012 08:57:03AM 10 points [-]

I think it's right to say that these aren't really "fair" problems, but they are unfair in a very interesting new way that Eliezer's definition of fairness doesn't cover, and it's not at all clear that it's possible to come up with a nice new definition that avoids this class of problem. They remind me of "Lucas cannot consistently assert this sentence".

Comment author: RichardKennaway 23 May 2012 09:38:41AM 9 points [-]

"Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT. ...."

This needs some serious mathematics underneath it. Omega is supposed to run a simulation of how an agent of a certain sort handled a certain problem, the result of that simulation being a part of the problem itself. I don't think it's possible to tell, just from these English words, that there is a solution to this fixed-point formulation. And TDT itself hasn't been formalised, although I assume there are people (Eliezer? Marcello? Wei Dai?) working on that.

Cf. the construction of Gödel sentences: you can't just assume that a proof-system can talk about itself, you have to explicitly construct a way for it to talk about itself and show precisely what "talking about itself" means, before you can do all the cool stuff about undecidable sentences, Löb's theorem, and so on.

Comment author: ciphergoth 23 May 2012 12:06:58PM 8 points [-]

But TDT already has this problem - TDT is all about finding a fixed point decision.

Comment author: Eliezer_Yudkowsky 23 May 2012 07:57:46PM 11 points [-]

This seems well-specified to me: Since the agent is not told its own output in advance, it is possible to run the "simulation" and the "real version" in finite time. If you hand me a computer program that is the agent, I will hand you a computer program that is Omega and the environment.

Comment author: Pentashagon 29 May 2012 06:18:50PM *  1 point [-]

Here is a useful program:

if (being_simulated()) pick_box_b(); else pick_both_boxes();

being_simulated() is an interesting function; it would presumably consist of things like comparing star maps across as broad a range of EM frequencies as possible with historical knowledge, the positions of planetary bodies, Earthly satellites, quasar signatures, and inputs of other individuals and networks. It could also be as simple as "if (get_current_average_cosmic_ray_bit_flips() != historical_bit_flips_average) return true;" Since my brain tries to play Omega with me every night (well, on the nights I get good sleep) I think we all have some practical experience with being_simulated(). While our brains are nowhere near as capable as Omega they are fairly good at suspending our own disbelief during dreams . What generally breaks disbelief is trying to behave rationally, including observing/testing the environment which in reality would be the outside world but in a dream is simulated. Ultimately Omega is forced to simulate a very large subset of the Universe in order to cause being_simulated() to return false, or to alter the program itself so that being_simulated vacuously returns true.

To prevent the latter attack the program should execute a fully homomorphically encrypted sub-function and base all its decisions on the outputs of that function. In other words its true decision theory should be hidden even from the hardware running it. This pushes Omega's capabilities into the cryptographic realm and the ability to solve arbitrarily hard cryptography problems. Once this is done, Omega becomes an oracle for solving any hard problem, e.g. by running this function as an input to the final decision theory:

if (strange_paradoxical_decision_is_required()) { bank_secret = factor_omegas_bank_account_public_key(); return sign_and_cash_check_for_one_billion_dollars(bank_secret); }

This restricts Omega to putting a time limit (or computation limit) on its simulations to avoid being suckered by its own computing power. Presumably Omega also puts a time limit on the real time a being can spend thinking before choosing which boxes it wants, but presumably that time is long enough for some fairly serious philosophizing and likely more than enough time for a reliable being_simulated() call.

Another approach is to consider how much information an intelligent being can gather about its environment. A human would have trouble determining whether two boxes are empty or full, but additional sensors could detect the presence of money in the boxes fairly reliably. What matters for Omega is making sure that a being cannot determine the contents of the boxes before picking them. From the perspective of a rational being this is equivalent to the boxes being filled with cash after making a decision. If Omega has the capability to obscure the contents of boxes then Omega certainly has the ability to obscure the placement of money into the boxes as they are chosen (just a glorified magic trick). Given that interpretation, CDT will one-box.

EDIT: I apologize for the formatting, I am not very good at escaping/formatting apparently.

Comment author: drnickbone 30 May 2012 06:43:09AM *  0 points [-]

if (beingsimulated()) pickboxb(); else pickboth_boxes()

This strategy is discussed in the follow-up article.

In general it's difficult, because by assumption Omega has the computational power to simulate more or less anything (including an environment matching the world as you remember it; this might be like the real world, or you might have spent your whole life so far as a sim). And the usual environment for these problems is a sealed room, so that you can't look at the stars etc.

Comment author: shokwave 23 May 2012 10:28:41AM *  9 points [-]

The more I think about it, the more interesting these problems get! Problem 1 seems to re-introduce all the issues that CDT has on Newcomb's Problem, but for TDT. I first thought to introduce the ability to 'break' with past selves, but that doesn't actually help with the simulation problem.

It did lead to a cute observation, though. Given that TDT cares about all sufficiently accurate simulations of itself, it's actually winning.

  • It one-boxes in Problem 1; thus ensuring that its simulacrum one-boxed in Omega's pre-game simulation, so TDT walked away with $2,000,000 (whereas CDT, unable to derive utility from a simulation of TDT, walked away with $1,001,000.) This is proofed against increasing the value of the second box; TDT still gains at least 1 dollar more (when the second box is $999,999), and simply two-boxes when the second box is as or more valuable.
  • In Problem 2, it picks in such a way that Omega must run at least 10 trials and the game itself; this means 11 TDT agents have had a 10% shot at $1,000,000. With an expected value of $1,100,000 it is doing better than the CDT agents walking away with $1,000,000.

It doesn't seem very relevant, but I think if we explored Richard's point that we need to actually formalise this, we'd find that any simulation high-fidelity enough to actually bind a TDT agent to its previous actions would necessarily give the agent the utility from the simulations, and vice versa, any simulation not accurate enough to give utility would be sufficiently different from TDT to allow our agent to two-box when that agent one-boxed.

Comment author: Khoth 23 May 2012 11:15:39AM 8 points [-]

Omega doesn't need to simulate the agent actually getting the reward. After the agent has made its choice, the simulation can just end.

Comment author: shokwave 24 May 2012 02:28:17AM -1 points [-]

Then the simulated TDT agent will one-box in Problem 1 so that the real TDT agent can two-box and get $1,001,000. The simulated TDT agent will pick a box randomy with a uniform distribution in Problem 2, so that the real TDT agent can select box 1 like CDT would.

(If the agent is not receiving any reward, it will act in a way that maximises the reward agents sufficiently similar to it would receive. In this situation of 'you get no reward', CDT would be completely indifferent and could not be relied upon to set up a good situation for future actual CDT agents.)

Of course, this doesn't work if the simulated TDT agent is not aware that it won't receive a reward. This strays pretty close to "Omega is all-powerful and out to make sure you lose"-type problems.

Comment author: JGWeissman 24 May 2012 03:18:00AM 0 points [-]

Of course, this doesn't work if the simulated TDT agent is not aware that it won't receive a reward.

The simulated TDT agent is not aware that it won't receive a reward, and therefore it does not work.

This strays pretty close to "Omega is all-powerful and out to make sure you lose"-type problems.

Yeah, it doesn't seem right to me that the decision theory being tested is used in the setup of the problem. But I don't think that the ability to simulate without rewarding the simulation is what pushes it over the threshold of "unfair".

Comment author: shokwave 25 May 2012 08:16:27AM *  0 points [-]

The simulated TDT agent is not aware that it won't receive a reward, and therefore it does not work. ... I don't think that the ability to simulate without rewarding the simulation is what pushes it over the threshold of "unfair".

I do agree. I think my previous post was still exploring the "can TDT break with a simulation of itself?" question, which is interesting but orthogonal.

Comment author: bogus 26 May 2012 09:44:38PM 0 points [-]

The simulated TDT agent is not aware that it won't receive a reward, and therefore it does not work.

This raises an interesting problem, actually. Omega could pose the following question:

Here are two boxes, A and B; you may choose either box, or take both. You are in one of two states of nature, with equal probability: one possibility is that you're in a simulation, in which case you will receive no reward, no matter what you choose. The other possibility is that a simulation of this problem was presented to an agent running TDT. I won't tell you what the agent decided, but I will tell you that if the agent two-boxed then I put nothing in Box B, whereas if the agent one-boxed then I put $1 million in Box B. Regardless of how the simulated agent decided, I put $1000 in Box A. Now please make your choice.

The solution for a TDT agent seems to be choosing box B, but there may be similar games where it makes sense to run a mixed strategy. I don't think that it makes much sense to rule out the possibility of running mixed strategies across simulations, because in most models of credible precommitment the other players do not have this kind of foresight (although Omega possibly does).

And yes, it is still the case that a CDT agent can outperform TDT, as long as the TDT agent knows that if she is in a simulation, her choice will influence a real game played by a TDT, with some probability. Nevertheless, as the probability of "leaking" to CDT increases, it does become more profitable (AIUI) for TDT to two-box with low probability.

Comment author: [deleted] 28 May 2012 09:31:42AM 1 point [-]

Omega is supposed to be always truthful, so either he rewards the sims as well, or you know something the sims don't and hence it's not obvious you'll do the same as them.

Comment author: Khoth 28 May 2012 10:15:10AM 0 points [-]

I thought Omega was allowed to lie to sims.

Even if he's not, after he's given a $1m simulated reward, does he then have to keep up a simulated environment for the sim to actually spend the money?

Comment author: [deleted] 28 May 2012 11:14:57AM 1 point [-]

If he can lie to sims, then you can't know he's not lying to you unless you know you're not a sim. If you do, it's not obvious you'd choose the same way as if you didn't.

Comment author: DanArmak 28 May 2012 03:18:33PM 0 points [-]

For instance, if you think Omega is lying and completely ignore everything he says, you obviously two-box.

Comment author: TheOtherDave 28 May 2012 03:32:18PM 1 point [-]

Why not zero-box in this case? I mean, what reason would I have to expect any money at all?

Comment author: DanArmak 28 May 2012 04:02:27PM 0 points [-]

Well, as long as you believe Omega enough to think no box contains sudden death or otherwise negative utility, you'd open them to see what was inside. But yes, you might not believe Omega at all.

General question: suppose we encounter an alien. We have no idea what its motivations, values, goals, or abilities are. On the other hand, if may have observed any amount of human comm traffic from wireless EM signals since the invention of radio, and from actual spy-probes before the human invention of high tech that would detect them.

It signals us in Morse code from its remote starship, offering mutually benefitial trade.

What prior should we have about the alien's intention? Should we use a native uniform prior that would tell us it's as likely to mean us good as harm, and so never reply because we don't know how it will try to influence our actions via communications? Should it tell us different agents who don't explicitly value one another will conflict to the extent their values differ, and so since value-space is vast and a randomly selected alien is unlikely to share many values with us, we should prepare for war? Should it tell us we can make some assumptions (which?) about naturally evolved agents or their Friendly-to-themselves creations? How safe are we if we try to "just read" English text written by an unknown, possibly-superintelligence which may have observed all our broadcast traffic since the age of radio? What does our non-detection of this alien civ until they chose to initiate contact tell us? Etc.

Comment author: TheOtherDave 28 May 2012 05:00:11PM 0 points [-]

A 50% chance of meaning us good vs harm isn't a prior I find terribly compelling.

There's a lot to say here, but my short answer is that this is both an incredibly dangerous and incredibly valuable situation, in which both the potential opportunity costs and the potential actual costs are literally astronomical, and in which there are very few things I can legitimately be confident of.

The best I can do in such a situation is to accept that my best guess is overwhelmingly likely to be wrong, but that it's slightly less likely to be wrong than my second-best guess, so I should operate on the basis of my best guess despite expecting it to be wrong. Where "best guess" here is the thing I consider most likely to be true, not the thing with the highest expected value.

I should also note that my priors about aliens in general -- that is, what I consider likely about a randomly selected alien intelligence -- are less relevant to this scenario than what I consider likely about this particular intelligence, given that it has observed us for long enough to learn our language, revealed itself to us, communicated with us in Morse code, offered mutually beneficial trade, etc.

The most tempting belief for me is that the alien's intentions are essentially similar to ours. I can even construct a plausible sounding argument for that as my best guess... we're the only other species I know capable of communicating the desire for mutually beneficial trade in an artificial signalling system, so our behavior constitutes strong evidence for their behavior. OTOH, it's pretty clear to me that the reason I'm tempted to believe that is because I can do something with that belief; it gives me a lot of traction for thinking about what to do next. (In a nutshell, I would conclude from that assumption that it means to exploit us for its long-term benefit, and whether that's good or bad for us depends entirely on what our most valuable-to-it resources are and how it can most easily obtain them and whether we benefit from that process.) Since that has almost nothing to do with the likelihood of it being true, I should distrust my desire to believe that.

Ultimately, I think what I do is reply that I value mutually beneficial trade with them, but that I don't actually trust them and must therefore treat them as a potential threat until I have gathered more information about them, while at the same time refraining from doing anything that would significantly reduce our chances of engaging in mutually beneficial trade in the future, and what do they think about all that?

Comment author: wedrifid 28 May 2012 02:02:58PM 0 points [-]

I thought Omega was allowed to lie to sims.

He can certainly give them counterfactual 'realities'. It would seem that he should be assumed to at least provide counterfactual realities wherein information provided by the simulation's representation of Omega indicates that he is perfectly trustworthy.

Even if he's not, after he's given a $1m simulated reward, does he then have to keep up a simulated environment for the sim to actually spend the money?

No. But if for whatever reason the simulated environment persists it should be one that is consistent with Omega keeping his word. Or, if part of the specification of the problem or the declarations made by Omega directly pertain to claims about what He will do regarding simulation then he will implement that policy.

Comment author: kybernetikos 01 June 2012 10:01:33PM *  0 points [-]

Actually, I'm not sure this matters. If the simulated agent knows he's not getting a reward, he'd still want to choose so that the nonsimulated version of himself gets the best reward.

So the problem is that the best answer is unavailable to the simulated agent: in the simulation you should one box and in the 'real' problem you'd like to two box, but you have no way of knowing whether you're in the simulation or the real problem.

Agents that Omega didn't simulate don't have the problem of worrying whether they're making the decision in a simulation or not, so two boxing is the correct answer for them.

The decisions being made are very different between an agent that has to make the decision twice and the first decision will affect the payoff of the second versus an agent that has to make the decision only once, so I think that in reality perhaps the problem does collapse down to an 'unfair' one because the TDT agent is presented with an essentially different problem to a nonTDT agent.

Comment author: [deleted] 23 May 2012 11:34:34AM 2 points [-]

Corollary: Omega can statically analyse the TDT agent's decision algorithm.

Comment author: ciphergoth 23 May 2012 10:37:18AM 11 points [-]

BTW, general question about decision theory. There appears to have been an academic study of decision theory for over a century, and causal and evidential decision theory were set out in 1981. Newcomb's paradox was set out in 1969. Yet it seems as though no-one thought to explore the space beyond these two decision theories until Eliezer proposed TDT, and it seems as if there is a 100% disconnect between the community exploring new theories (which is centered around LW) and the academic decision theory community. This seems really, really odd - what's going on?

Comment author: Jayson_Virissimo 23 May 2012 12:59:51PM 12 points [-]

Yet it seems as though no-one thought to explore the space beyond these two decision theories until Eliezer proposed TDT...

This is simply not true. Robert Nozick (who introduced Newcomb's problem to philosophers) compared/contrasted EDT and CDT at least as far back as 1993. Even back then, he noted their inadequacy on several decision-theoretic problems and proposed some alternatives.

Comment author: ciphergoth 23 May 2012 01:14:53PM 4 points [-]

Me being ignorant of something seemed like a likely part of the explanation - thanks :) I take it you're referencing "The Nature of Rationality"? Not read that I'm afraid. If you can spare the time I'd be interested to know what he proposes -thanks!

Comment author: Jayson_Virissimo 23 May 2012 01:49:20PM *  6 points [-]

I haven't read The Nature of Rationality in quite a long time, so I won't be of much help. For a very simple and short introduction to Nozick's work on decision theory, you should read this (PDF).

Comment author: thomblake 23 May 2012 01:47:36PM 3 points [-]

It should be noted that Newcomb's problem was considered interesting in Philosophy in 1969, but decision theories were studied more in other fields - so there's a disconnect between the sorts of people who usually study formal decision theories and that sort of problem.

Comment author: steven0461 23 May 2012 04:29:22PM *  0 points [-]

(Deleting comments seems not to be working. Consider this a manual delete.)

Comment author: Luke_A_Somers 23 May 2012 04:50:21PM 1 point [-]

Decision Theory is and can be applied to a variety of problems here. It's just that AI may face Newcomb-like problems and in particular we want to ensure a 1-boxing-like behavior on the part of AI.

Comment author: cousin_it 23 May 2012 07:09:45PM *  3 points [-]

The rationale for TDT-like decision theories is even more general, I think. There's no guarantee that our world contains only one copy of something. We want a decision theory that would let the AI cooperate with its copies or logical correlates, rather than wage pointless wars.

Comment author: Vladimir_Nesov 23 May 2012 09:12:16PM *  2 points [-]

We want a decision theory that would let the AI cooperate with its copies or logical correlates, rather than wage pointless wars.

Constructing rigorous mathematical foundation of decision theory to explain what a decision problem or a decision or a goal are, is potentially more useful than resolving any given informally specified class of decision problems.

Comment author: David_Gerard 24 May 2012 12:29:59PM -1 points [-]

What is an example of such a real-world problem?

Comment author: Luke_A_Somers 24 May 2012 06:09:13PM 4 points [-]

Negotiations with entities who can read the AI's source code.

Comment author: bbarth 03 June 2012 06:35:06PM -2 points [-]

Given the week+ delay in this response, it's probably not going to see much traffic, but I'm not convinced "reading" source code is all that helpful. Omega is posited to have nearly god-like abilities in this regard, but since this is a rationalist discussion, we probably have to rule out actual omnipotence.

If Omega intends to simply run the AI on spare hardware it has, then it has to be prepared to validate (in finite time and memory) that the AI hasn't so obfuscated its source as to be unintelligible to rational minds. It's also possible that the source to an AI is rather simple but it is dependent a large amount of input data in the form of a vast sea of numbers. I.e., the AI in question could be encoded as an ODE system integrator that's reliant on a massive array of parameters to get from one state to the next. I don't see why we should expect Omega to be better at picking out the relevant, predictive parts of these numbers than we are.

If the AI can hide things in its code or data, then it can hide functionality that tests to determine if it is being run by Omega or on its own protected hardware. In such a case it can lie to Omega just as easily as Omega can lie to the "simulated" version of the AI.

I think it's time we stopped positing an omniscient Omega in these complications to Newcomb's problem. They're like epicycles on Ptolemaic orbital theory in that they continue a dead end line of reasoning. It's better to recognize that Newcomb's problem is a red herring. Newcomb's problem doesn't demonstrate problems that we should expect AI's to solve in the real world. It doesn't tease out meaningful differences between decision theories.

That is, what decisions on real-world problems do we expect to be different between two AIs that come to different conclusions about Newcomb-like problems?

Comment author: Dolores1984 03 June 2012 07:00:25PM *  2 points [-]

You should note that every problem you list is a special case. Obviously, there are ways of cheating at Newcomb's problem if you're aware of salient details beforehand. You could simply allow a piece of plutonium to decay, and do whatever the resulting Geiger counter noise tells you to. That does not, however, support your thesis that Newcomb's problem is a totally artificial problem with no logical intrusions into reality.

As a real-world example, imagine an off-the-shelf stock market optimizing AI. Not sapient, to make things simpler, but smart. When any given copy begins running, there are already hundreds or thousands of near-identical copies running elsewhere in the market. If it fails to predict their actions from its own, it will do objectively worse than it might otherwise do.

Comment author: Eliezer_Yudkowsky 23 May 2012 07:47:30PM 8 points [-]

There were plenty of previous theories trying to go beyond CDT or EDT, they just weren't satisfactory.

Comment author: Manfred 24 May 2012 07:49:00PM *  2 points [-]

Dispositional decision theory :P

... which I cannot find a link to the paper for, now. Hm. But basically it was just TDT, with less awareness of why.

EDIT: Ah, here it was. Credit to Tim Tyler.

Comment author: Eliezer_Yudkowsky 24 May 2012 08:27:52PM 2 points [-]

I checked it. Not the same thing.

Comment author: crazy88 24 May 2012 09:36:36PM *  5 points [-]

This paper talks about reflexive decision models and claims to develop a form of CDT which one boxes.

It's in my to-read list but I haven't got to it yet so I'm not sure whether it's of interest but I'm posting it just in case (it could be a while until I have time to read it so I won't be able to post a more informed comment any time soon).

Though this theory post-dates TDT and so isn't interesting from that perspective.

Comment author: Ezekiel 23 May 2012 11:03:55AM 21 points [-]

I think we could generalise problem 2 to be problematic for any decision theory XDT:

There are 10 boxes, numbered 1 to 10. You may only take one. Omega has (several times) run a simulated XDT agent on this problem. It then put a prize in the box which it determined was least likely to be taken by such an agent - or, in the case of a tie, in the box with the lowest index.

If agent X follows XDT, it has at best a 10% chance of winning. Any sufficiently resourceful YDT agent, however, could run a simulated XDT agent themselves, and figure out what Omega's choice was without getting into an infinite loop.

Therefore, YDT performs better than XDT on this problem.

If I'm right, we may have shown the impossibility of a "best' decision theory, no matter how meta you get (in a close analogy to Godelian incompleteness). If I'm wrong, what have I missed?

Comment author: cousin_it 23 May 2012 11:33:55AM *  9 points [-]

You're right about problem 2 being a fully general counterargument, but your philosophical conclusion seems to be stopping too early. For example, can we define a class of "fair" problems that excludes problem 2?

Comment author: ciphergoth 23 May 2012 12:11:09PM 2 points [-]

One possible place to look is that we're allowing Omega access not just to a particular simulated decision of TDT, but to the probabilities with which it makes these decisions. If we force it to simulate TDT many times and sample to learn what the probabilities are, it can't detect the exact balance for which it does deterministic symmetry breaking, and the problem goes away.

This solution occurred to me because this forces Omega to have something like a continuous behaviour response to changes in the probabilities of different TDT outputs, and it seems possible given that to imagine a proof that a fixed point must exist.

Comment author: drnickbone 23 May 2012 01:20:26PM 0 points [-]

Fair point - how does Omega tell when the sim's choosing probabilities are exactly equal? Well I was thinking that Omega could prove they are equal (by analysing the simulation's behaviour, and checking where it calls on random bits). Or if it can't do that, then it can just check that the choice frequencies are "statistically equal" (i.e. no significant differences after a billion runs, say) and treat them as equal for the tie-breaker rule. The "statistically equal" approach might give the TDT agent a very slightly higher than 10% chance of winning the money, though I haven't analysed this in any detail.

Comment author: Ezekiel 23 May 2012 10:26:31PM 0 points [-]

If the subject can know the exact code of TDT, Omega can know the exact code of TDT, and analyse it however it likes. That means it can know exactly where randomness is invoked - why would it have to sample?

Comment author: drnickbone 24 May 2012 11:59:02AM 1 point [-]

This was my first thought: Omega can just prove the choosing probabilities are equal. However, it's not totally straightforward, because the sim could sample more random bits depending on the results of its first random bits, and so on, leading to an exponentially growing outcome tree of possibilities, with no upper size bound to the length of the tree. There might not be an easy proof of equality in that case. Sampling and statistical equality is the next best approach...

Comment author: Ezekiel 23 May 2012 10:36:28PM 1 point [-]

It looks like the issue here is that while Omega is ostensibly not taking into account your decision theory, it implicitly is by simulating an XDT agent. So a first patch would be to define simulations of a specific decision theory (as opposed to simulations of a given agent) as "unfair".

On the other hand, we can't necessarily know if a given computation is effectively equivalent to simulating a given decision theory. Even if the string "TDT" is never encoded anywhere in Omega's super-neurons, it might still be simulating a TDT agent, for example.

On the first hand again, it might be easy for most problems to figure out whether anyone is implicitly favouring one DT over another, and thus whether they're "fair".

Comment author: dlthomas 23 May 2012 11:28:34PM *  3 points [-]

If I'm right, we may have shown the impossibility of a "best' decision theory, no matter how meta you get (in a close analogy to Godelian incompleteness). If I'm wrong, what have I missed?

I would say that any such problem doesn't show that there is no best decision theory, it shows that that class of problem cannot be used in the ranking.

Edited to add: Unless, perhaps, one can show that an instantiation of the problem with particular choice of (in this case decision theory, but whatever is varied) is particularly likely to be encountered.

Comment author: Bundle_Gerbe 30 May 2012 09:27:02PM *  1 point [-]

To draw out the analogy to Godelian incompleteness, any computable decision theory is subject to the suggested attack of being given a "Godel problem'' like problem 1, just as any computable set of axioms for arithmetic has a Godel sentence. You can always make a new decision theory TDT' that is TDT+ do the right thing for the Godel problem. But TDT' has it's own Godel problem of course. You can't make a computable theory that says "do the right thing for all Godel probems", if you try to do that it would not give you something computable. I'm sure this is all just restating what you had in mind, but I think it's worth spelling out.

If you have some sort of oracle for the halting problem (i.e. a hypercomputer) and Omega doesn't, he couldn't simulate you, so you would presumably be able to always win fair problems. Otherwise the best thing you could hope for is to get the right answer whenever your computation halts, but fail to halt in your computation for some problems, such as your Godel problem. (A decision theory like this can still be given a Godel problem if Omega can solve the halting problem, "I simulated you and if you fail to halt on this problem..."). I wonder if TDT fails to halt for its Godel problem, or if some natural modification of it might have this property, but I don't understand it well enough to guess.

I am less optimistic about revising "fair" to exclude Godel problems. The analogy would be proving Peano arithmetic is complete "except for things that are like Godel sentences." I don't know of any formalizations of the idea of "being a Godel sentence".

Comment author: ciphergoth 23 May 2012 11:14:04AM 4 points [-]

There's a different version of these problems for each decision theory, depending on what Omega simulates. For CDT, all agents two-box and all agents get $1000. However, on problem 2, it seems like CDT doesn't have a well-defined decision at all; the effort to work out what Omega's simulator will say won't terminate.

(I'm spamming this post with comments - sorry!)

Comment author: drnickbone 23 May 2012 12:16:59PM *  2 points [-]

You raise an interesting question here - what would CDT do if a CDT agent were in the simulation?

It looks to me that CDT just doesn't have the conceptual machinery to handle this problem properly, so I don't really know. One thing that could happen is that the simulated CDT agent tries to simulate itself and gets stuck in an infinite loop. I didn't specify exactly what would happen in that case, but if Omega can prove that the simulated agent is caught in a loop, then it knows the sim will choose each box with probability zero, and so (since these are all equal), it will fill box 1. But now can a real-life CDT agent also work this out, and beat the game by selecting box 1. But if so, why won't the sim do that, and so on? Aargh !!!

Another thought I had is that CDT could try tossing a logical coin, like computing the googleth digit of pi, and if it is even choose box 1, whereas if it is odd, choose box 2. If it runs out of time before computing (which the real-life agent will do), then it just picks box 1 or 2 with equal probability. The simulated CDT agent will however get to the end of the computation (Omega has arbitrary computational resources) and definitely pick 1 or 2 with certainty, so the money is definitely in one of those two boxes, which looks like the probability of the actual agent winning is raised to 50%. TDT might do the same.

However this looks like cheating to me, for both CDT and TDT.

EDIT: On reflection, it seems clear that CDT would never do anything "creatively sneaky" like tossing a logical coin; but it is the sort of approach that TDT (or some variant thereof) might come up with. Though I still think it's cheating.

Comment author: ciphergoth 23 May 2012 03:34:14PM 2 points [-]

I don't think your "detect infinite resources and cheat" strategy is really worth thinking about. Instead of strategies like CDT and TDT whose applicability to limited compute resources is unclear, suppose you have an anytime strategy X, which you can halt at any time and get a decision. Then there's really a family of algorithms X-t, where t is the time you're going to give it to run. In this case, if you are X-t, we can consider the situation where Omega fields X-t against you.

Comment author: orthonormal 23 May 2012 03:34:15PM *  1 point [-]

The version of CDT that I described explicitly should arrive at the uniformly random solution. You don't have to be able to simulate a program all the way through, just able to prove things about its output.

EDIT: Wait, this is wrong. It won't be able to consistently derive an answer, because of the way it acts given such an answer, and so it will go with whatever its default Nash equilibrium is.

Comment author: drnickbone 23 May 2012 03:58:16PM 1 point [-]

Re: your EDIT. Yes, I've had that sort of reaction a couple of times today!

I'm shifting around between "CDT should pick at random, no CDT should pick Box 1, no CDT should use a logical coin, no CDT should pick it's favourite number in the set {1, 2} with probability 1, and hope that the version in the sim has a different favourite number, no, CDT will just go into a loop or collapse in a heap."

I'm also quite clueless how a TDT is supposed to decide if it's told there's a CDT in the sim... This looks like a pretty evil decision problem in its own right.

Comment author: orthonormal 23 May 2012 06:15:00PM 1 point [-]

Well, the thing is that CDT doesn't completely specify a decision theory. I'm confident now that the specific version of CDT that I described would fail to deduce anything and go with its default, but it's hard to speak for CDTs in general on such a self-referential problem.

Comment author: Vladimir_Nesov 23 May 2012 11:41:30AM *  15 points [-]

Consider Problem 3: Omega presents you with two boxes, one of which contains $100, and says that it just ran a simulation of you in the present situation and put the money in the box the simulation didn't choose.

This is a standard diagonal construction, where the environment is set up so that you are punished for the actions you choose, and rewarded for those of don't choose, irrespective of the actions. This doesn't depend on the decision algorithm you're implementing. A possible escape strategy is to make yourself unpredictable to the environment. The difficulty would also go away if the thing being predicted wasn't you, but something else you could predict as well (like a different agent that doesn't simulate you).

Comment author: ciphergoth 23 May 2012 11:57:12AM 8 points [-]

The correct solution to this problem is to choose each box with equal probability; this problem is the reason why decision theories have to be non-deterministic. It comes up all the time in real life: I try and guess what safe combination you chose, try that combination, and if it works I take all your money. Or I try to guess what escape route you'll use and post all the guards there.

What's interesting about Problem 2 is that it makes what would be the normal game-theoretic strategy unstable by choosing deterministically where the probabilities are exactly equal.

Comment author: APMason 23 May 2012 12:28:15PM 4 points [-]

this problem is the reason why decision theories have to be non-deterministic. It comes up all the time in real life: I try and guess what safe combination you chose, try that combination, and if it works I take all your money.

Of course, you can just set up the thought experiment with the proviso that "be unpredictable" is not a possible move - in fact that's the whole point of Omega in these sorts of problems. If Omega's trying to break into your safe, he takes your money. In Nesov's problem, if you can't make yourself unpredictable, then you win nothing - it's not even worth your time to open the box. In both cases, a TDT agent does strictly as well as it possibly could - the fact that there's $100 somewhere in the vicinity doesn't change that.

Comment author: jimrandomh 23 May 2012 02:14:20PM *  29 points [-]

You can construct a "counterexample" to any decision theory by writing a scenario in which it (or the decision theory you want to have win) is named explicitly. For example, consider Alphabetic Decision Theory, which writes a description of each of the options, then chooses whichever is first alphabetically. ADT is bad, but not so bad that you can't make it win: you could postulate an Omega which checks to see whether you're ADT, gives you $1000 if you are, and tortures you for a year if you aren't.

That's what's happening in Problem 1, except that it's a little bit hidden. There, you have an Omega which says: if you are TDT, I will make the content of these boxes depend on your choice in such a way that you can't have both; if you aren't TDT, I filled both boxes.

You can see that something funny has hapened by postulating TDT-prime, which is identical to TDT except that Omega doesn't recognize it as a duplicate (eg, it differs in some way that should be irrelevant). TDT-prime would two-box, and win.

Comment author: APMason 23 May 2012 02:28:04PM 1 point [-]

You can see that something funny has hapened by postulating TDT-prime, which is identical to TDT except that Omega doesn't recognize it as a duplicate (eg, it differs in some way that should be irrelevant). TDT-prime would two-box, and win.

I don't think so. If TDT-prime two boxes, the TDT simulation two-boxes, so only one box is full, so TDT-prime walks away with $1000. Omega doesn't check what decision theory you're using at all - it just simulates TDT and bases its decision on that. I do think that this ought to fall outside a rigorously defined class of "fair" problems, but it doesn't matter whether Omega can recognise you as a TDT-agent or not.

Comment author: jimrandomh 23 May 2012 02:30:47PM 2 points [-]

I don't think so. If TDT-prime two boxes, the TDT simulation two-boxes, so only one box is full, so TDT-prime walks away with $1000.

No, if TDT-prime two boxes, the TDT simulation still one-boxes.

Comment author: APMason 23 May 2012 02:39:16PM 6 points [-]

Hmm, so TDT-prime would reason something like, "The TDT simulation will one-box because, not knowing that it's the simulation, but also knowing that the simulation will use exactly the same decision theory as itself, it will conclude that the simulation will do the same thing as itself and so one-boxing is the best option. However, I'm different to the TDT-simulation, and therefore I can safely two-box without affecting its decision." In which case, does it matter how inconsequential the difference is? Yep, I'm confused.

Comment author: drnickbone 23 May 2012 03:34:34PM 2 points [-]

I also had thoughts along these lines - variants of TDT could logically separate themselves, so that T-0 one-boxes when it is simulated, but T-1 has proven that T-0 will one-box, and hence T-1 two-boxes when T-0 is the sim.

But a couple of difficulties arise. The first is that if TDT variants can logically separate from each other (i.e. can prove that their decisions aren't linked) then they won't co-operate with each other in Prisoner's Dilemma. We could end up with a bunch of CliqueBots that only co-operate with their exact clones, which is not ideal.

The second difficulty is that for each specific TDT variant, one with algorithm T' say, there will be a specific problematic problem on which T' will do worse than CDT (and indeed worse than all the other variants of TDT) - this is the problem with T' being the exact algorithm running in the sim. So we still don't get the - desirable - property that there is some sensible decision theory called TDT that is optimal across fair problems.

The best suggestion I've heard so far is that we try to adjust the definition of "fairness", so that these problematic problems also count as "unfair". I'm open to proposals on that one...

Comment author: APMason 23 May 2012 04:22:55PM *  0 points [-]

Well, I've had a think about it, and I've concluded that it would matter how great the difference between TDT and TDT-prime is. If TDT-prime is almost the same as TDT, but has an extra stage in its algorithm in which it converts all dollar amounts to yen, it should still be able to prove that it is isomorphic to Omega's simulation, and therefore will not be able to take advantage of "logical separation".

But if TDT-prime is different in a way that makes it non-isomorphic, i.e. it sometimes gives a different output given the same inputs, that may still not be enough to "separate" them. If TDT-prime acts the same as TDT, except when there is a walrus in the vicinity, in which case it tries to train the walrus to fight crime, it is still the case in this walrus-free problem that it makes exactly the same choice as the simulation (?). It's as if you need the ability to prove that two agents necessarily give the same output for the particular problem you're faced with, without proving what output those agents actually give, and that sure looks crazy-hard.

EDIT: I mean crazy-hard for the general case, but much, much easier for all the cases where the two agents are actually the same.

EDIT 2: On the subject of fairness, my first thoughts: A fair problem is one in which if you had arrived at your decision by a coin flip (which is as transparently predictable as your actual decision process - i.e. Omega can predict whether it's going to come down heads or tails with perfect accuracy), you would be rewarded or punished no more or less than you would be using your actual decision algorithm (and this applies to every available option).

EDIT 3: Sorry to go on like this, but I've just realised that won't work in situations where some other agent bases their decision on whether you're predicting what their decision will be, i.e. Prisoner's Dilemma.

Comment author: jimrandomh 23 May 2012 08:14:02PM 0 points [-]

The right place to introduce the separation is not in between TDT and TDT-prime, but in between TDT-prime's output and TDT-prime's decision. If its output is a strategy, rather than a number of boxes, then that strategy can include a byte-by-byte comparison; and if TDT and TDT-prime both do it that way, then they both win as much as possible.

Comment author: dlthomas 23 May 2012 08:25:17PM 1 point [-]

But doesn't that make cliquebots, in general?

Comment author: drnickbone 24 May 2012 12:08:43PM 0 points [-]

I'm thinking hard about this one...

Can all the TDT variants adopt a common strategy, but with different execution results, depending on source-code self-inspection and sim-inspection? Can that approach really work in general without creating CliqueBots? Don't know yet without detailed analysis.

Another issue is that Omega is not obliged to reveal the source-code of the sim; it could instead provide some information about the method used to generate / filter the sim code (e.g. a distribution the sim was drawn from) and still lead to a well-defined problem. Each TDT variant would not then know whether it was the sim.

I'm aiming for a follow-up article addressing this strategy (among others).

Comment author: khafra 24 May 2012 05:57:56PM 0 points [-]

Can all the TDT variants adopt a common strategy, but with different execution results, depending on source-code self-inspection and sim-inspection?

This sounds equivalent to asking "can a turing machine generate non-deterministically random numbers?" Unless you're thinking about coding TDT agents one at a time and setting some constant differently in each one.

Comment author: ciphergoth 23 May 2012 03:09:44PM *  20 points [-]

Right, but this is exactly the insight of this post put another way. The possibility of an Omega that rewards eg ADT is discussed in Eliezer's TDT paper. He sets out an idea of a "fair" test, which evaluates only what you do and what you are predicted to do, not what you are. What's interesting about this is that this is a "fair" test by that definition, yet it acts like an unfair test.

Because it's a fair test, it doesn't matter whether Omega thinks TDT and TDT-prime are the same - what matters is whether TDT-prime thinks so.

Comment author: jimrandomh 23 May 2012 04:26:37PM 2 points [-]

Not exactly. Because the problem statement says that it simulates "TDT", if you were to expand the problem statement out into code it would have to contain source code to a complete instantiation of TDT. When the problem statement is run, TDT or TDT-prime can look at that instantiation and compare it to its own source code. TDT will see that they're the same, but TDT-prime will notice that they are different, and thereby infer that it is not the simulated copy. (Any difference whatsoever is proof of this.)

Consider an alternative problem. Omega flips a coin, and asks you to guess what it was, with a prize if you guess correctly. If the coin was heads, he shows you a piece of paper with TDT's source code. If the coin was tails, he shows you a piece of paper with your source code, whatever that is.

Comment author: cousin_it 23 May 2012 05:54:33PM *  11 points [-]

I'm not sure the part about comparing source code is correct. TDT isn't supposed to search for exact copies of itself, it's supposed to search for parts of the world that are logically equivalent to itself.

Comment author: Jack 23 May 2012 10:06:14PM 4 points [-]

He sets out an idea of a "fair" test, which evaluates only what you do and what you are predicted to do, not what you are.

Two questions: First, how does is this distinction justified? What a decision theory is is a strategy for responding to decision tasks and simulating agents performing the right decision tasks tells you what kind of decision theory they're using. Why does it matter if it's done implicitly (as in Newcomb's discrimination against CDT) or explicitly. And second why should we care about it? Why is it important for a decision theory to pass fair tests but not unfair tests?

Comment author: ciphergoth 24 May 2012 06:50:28AM 4 points [-]

Real-world unfair tests could matter, though it's not clear if there are any. However, hypothetical unfair tests aren't very informative about what is a good decision theory, because it's trivial to cook one up that favours one theory and disfavours another. I think the hope was to invent a decision theory that does well on all fair tests; the example above seems to show that may not be possible.

Comment author: APMason 24 May 2012 10:47:29AM 7 points [-]

Why is it important for a decision theory to pass fair tests but not unfair tests?

Well, on unfair tests a decision theory still needs to do as well as possible. If we had a version of the original Newcomb's problem, with the one difference that a CDT agent gets $1billion just for showing up, it's still incumbent upon a TDT agent to walk away with $1000000 rather than $1000. The "unfair" class of problems is that class where "winning as much as possible" is distinct from "winning the most out of all possible agents".

Comment author: Davorak 23 May 2012 05:42:51PM -1 points [-]

Omega (who experience has shown is always truthful) presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT.

There seems to be a contradiction here. If Omega siad this to me I would either have to believe omega just presented evidence of being untruthful some of the time.

If Omega simulated the problem at hand then in said simulation Omega must have siad: "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT." In the first simulation the statement is a lie.

Problem 2 has a similar problem.

It is not obvious that the problem can be reformulated to keep Omega constantly truthfully and still have CDT or EDT come out ahead of TDT.

Comment author: cousin_it 23 May 2012 05:57:50PM *  1 point [-]

Omega could truthfully say "the contents of the boxes are exactly as if I'd presented this problem to an agent running TDT".

Comment author: Davorak 23 May 2012 06:41:41PM 0 points [-]

I do not know if Omega can say that truthfully because I do not know weather the self referential equation representing the problem has a solution.

The problems set out by the OP assumes there is a solution and a particular answer but with out writing out the equation and plugging in his solution to show the solution actually works.

Comment author: cousin_it 23 May 2012 07:05:49PM *  0 points [-]

There is a solution because Omega can get an answer by simulating TDT, or am I missing something?

Comment author: magfrump 24 May 2012 08:12:07AM 0 points [-]

It may or may not be proven that TDT settles on answers to questions involving TDT. If TDT doesn't get an answer, then TDT can't get an answer.

Presumably it is true that TDT settles but if it isn't proven, it may not be true; or it could be that the proof (i.e. a formalization of TDT) will provide insight that is currently lacking (such as cutting off after a certain level of resource use; can Omega emulate how many resources the current TDT agent will use? Can the TDT agent commit to using a random number of resources? Do true random-number generators exist? These problems might all be inextricable. Or they might not. I, for one, don't know.)

Comment author: cousin_it 24 May 2012 09:17:30AM *  1 point [-]

It may or may not be proven that TDT settles on answers to questions involving TDT.

We have several formalizations of UDT that would solve this problem correctly.

Comment author: magfrump 24 May 2012 05:57:48PM 0 points [-]

Having several formalizations is 90% of a proof, not 100% of a proof. Turn the formalization into a computer program AND either prove that it halts or run this simulation on it in finite time.

I believe that it's true that TDT will get an answer and hence Omega will get an answer, but WHY this is true relies on facts about TDT that I don't know (specifically facts about its implementation; maybe facts about differential topology that game-theoretic equilibrium results rely on.)

Comment author: cousin_it 24 May 2012 06:27:22PM *  0 points [-]

The linked posts have proofs that the programs halt and return the correct answer. Do you understand the proofs, or could you point out the areas that need more work? Many commenters seemed to understand them...

Comment author: magfrump 25 May 2012 04:14:27AM 1 point [-]

I do not understand the proofs, primarily because I have not put time in to trying to understand them.

I may have become somewhat defensive in these posts (or withdrawn I guess?) but looking back my original point was really to point out that, naively, asking whether the problem is well-defined is a reasonable question.

The questions in the OP set off alarm bells for me of "this type of question might be a badly-defined type of question" so asking whether these decisions are in the "halting domain" (is there an actual term for that?) of TDT seems like a reasonable question to ask before putting too much thought into other issues.

I believe the answer to be that yes these questions are in the "halting domain" of TDT, but I also believe that understanding what that is and why these questions are legitimate and the proofs that TDT halts will be central to any resolution of these problems.

What I'm really trying to say here is that it makes sense to ask these questions, but I don't understand why, so I think Davorak's question was reasonable, and your answer didn't feel complete to me. Looking back, I don't think I've contributed much to this conversation. Sorry!

Comment author: drnickbone 23 May 2012 06:57:44PM *  3 points [-]

Your difficulty seems to be with the parenthesis "(who experience has shown is always truthful)". The relevant experience here is going to be derived from real-world subjects who have been in Omega problems, exactly as is assumed for the standard Newcomb problem. It's not obvious that Omega always tells the truth to its simulations; no-one in the outside world has experience of that.

However you can construe the problem so that Omega doesn't have to lie, even to sims. Omega could always prefix its description of the problem with a little disclaimer "You may be one of my simulations. But if not, then...".

Or Omega could simulate a TDT agent making decisions as if it had just been given the problem description verbally by Omega, without Omega actually doing so. (Whether that's possible or not depends a bit on the simulation).

Comment author: dvasya 23 May 2012 05:51:57PM 0 points [-]

In both your problems, the seeming paradox comes from failure to recognize that the two agents (one that Omega has simulated and one making the decision) are facing entirely different prior information. Then, nothing requires them to make identical decisions. The second agent can simulate itself having prior information I1 (that the simulated agent has been facing), then infer Omega's actions, and arrive at the new prior information I2 that is relevant for the decision. And I2 now is independent of which decision the agent would make given I2.

Comment author: drnickbone 23 May 2012 07:06:52PM *  2 points [-]

Are you sure that they are facing different prior information? If the sim is a good one, then the TDT agent won't be able to tell whether it is the sim or not. However, you are right that one solution could be that there are multiple TDT variants who have different information and so can logically separate their decisions.

I mentioned the problems with that in another response here. The biggest problem is that it seriously undermines the attraction and effectiveness of TDT as a decision theory if different instances of TDT are going to find excuses to separate from each other.

Comment author: gRR 23 May 2012 07:01:14PM 9 points [-]

The problems look like a kind of an anti-Prisoner's Dilemma. An agent plays against an opponent, and gets a reward iff they played differently. Then any agent playing against itself is screwed.

Comment author: Wei_Dai 23 May 2012 07:16:17PM 15 points [-]

My sense is that question 6 is a better question to ask than 5. That is, what's important isn't drawing some theoretical distinction between fair and unfair problems, but finding out what problems we and/or our agents will actually face. To the extent that we are ignorant of this now but may know more in the future when we are smarter and more powerful, it argues for not fixing a formal decision theory to determine our future decisions, but instead making sure that we and/or our agents can continue to reason about decision theory the same way we currently can (i.e., via philosophy).

Comment author: shminux 23 May 2012 08:37:36PM *  1 point [-]

I wonder if there is a mathematician in this forum willing to present the issue in a form of a theorem and a proof for it, in a reasonable mathematical framework. So far all I can see is a bunch of ostensibly plausible informal arguments from different points of view.

Either this problem can be formalized, in which case such a theorem is possible to formulate (whether or not it is possible to prove), or it cannot, in which case it is pointless to argue about it.

Comment author: Vladimir_Nesov 23 May 2012 09:58:45PM 2 points [-]

Either this problem can be formalized, in which case such a theorem is possible to formulate (whether or not it is possible to prove), or it cannot, in which case it is pointless to argue about it.

Or it's hard to formalize.

Comment author: Douglas_Knight 25 May 2012 12:52:53AM 1 point [-]

Which issue/problem? fairness?

Comment author: shminux 25 May 2012 05:35:09PM *  1 point [-]

The fairness concept:

the reward is a function of the agent's actual choices in the problem (namely which box or boxes get picked) and independent of the method that the agent uses to choose, or of its choices on any other problems.

should be reasonably easy to formalize, because it does not depend on a full [T]DT algorithm. After that, evaluate the performace of [a]DT under a [b]DT-aware Omega Newcomb's problems, as described in the OP, where 'a' and 'b' are particular DTs, e.g. a=b=T.

Comment author: AndyCossyleon 23 May 2012 08:39:08PM 5 points [-]

Someone may already have mentioned this, but doesn't the fact that these scenarios include self-referencing components bring Goedel's Incompleteness Theorem into play somehow? I.e. As soon as we let decision theories become self-referencing, it is impossible for a "best" decision theory to exist at all.

Comment author: shokwave 25 May 2012 08:30:56AM 0 points [-]

doesn't the fact that these scenarios include self-referencing components bring Goedel's Incompleteness Theorem into play somehow?

Self-reference and the like is necessary for Goedel sentences but not sufficient. It's certainly plausible that this scenario could have a Goedel sentence, but whether the current problem is isomorphic to a Goedel sentence is not obvious, and seems unlikely.

Comment author: Jonii 23 May 2012 09:32:40PM 1 point [-]

Interaction of this simulated TDT and you is so complicated I don't think many of commenters here actually did the math to see how should they expect the simulated TDT agent to react in these situations. I know I didn't. I tried, and failed.

Comment author: cousin_it 24 May 2012 09:35:39AM *  3 points [-]

Maybe I'm missing something, but the formalization looks easy enough to me...

def tdt_utility():
if tdt(tdt_utility) == 1:
box1 = 1000
box2 = 1000000
else:
box1 = 1000
box2 = 0
if tdt(tdt_utility) == 1:
return box2
else:
return box1+box2
def your_utility():
if tdt(tdt_utility) == 1:
box1 = 1000
box2 = 1000000
else:
box1 = 1000
box2 = 0
if you(your_utility) == 1:
return box2
else:
return box1+box2

The functions tdt() and you() accept the source code of a function as an argument, and try to maximize its return value. The implementation of tdt() could be any of our formalizations that enumerate proofs successively, which all return 1 if given the source code to tdt_utility. The implementation of you() could be simply "return 2".

Comment author: BrandonReinhart 04 June 2012 02:11:56AM 0 points [-]

Thanks for this. I hadn't seen someone pseudocode this out before. This helps illustrate that interesting problems lie in the scope above (callers to tdt_uility() etc) and below (implementation of tdt() etc).

I wonder if there is a rationality exercise in 'write pseudocode for problem descriptions, explore the callers and implementations'.

Comment author: Jack 24 May 2012 12:17:00AM 4 points [-]

Can someone answer the following: Say someone implemented an AGI using CDT. What exactly would go wrong that a better decision theory would fix?

Comment author: Manfred 24 May 2012 07:38:35PM 5 points [-]

It will defect on all prisoners dilemmas, even if they're iterated. So, for example, if we'd left it in charge of our nuclear arsenal during the cold war, it would have launched missiles as fast as possible.

But I think the main motivation was that, when given the option to self-modify, a CDT agent will self-modify as a method of precommittment - CDT isn't "reflectively consistent." And so if you want to predict an AI's behavior, if you predict based on CDT with no self-modification you'll get it wrong, since it doesn't stay CDT. Instead, you should try to find out what the AI wants to self-modify to, and predict based on that.

Comment author: Jack 24 May 2012 07:41:55PM 0 points [-]

Ah, that second paragraph makes perfect sense. Thanks.

Comment author: DanielLC 25 May 2012 07:17:14AM 0 points [-]

even if they're iterated.

That doesn't seem right. Defecting causes the opponent to defect next time. It's a bad idea with any decision theory.

Instead, you should try to find out what the AI wants to self-modify to, and predict based on that.

It won't self-modify to TDT. It will self-modify to something similar, but using its beliefs at the time of modification as the priors. For example, it will use the doomsday argument immediately to find out how long the world is likely to last, and it will use that information from then on, rather than redoing it as its future self (getting a different answer).

Comment author: shokwave 25 May 2012 08:21:03AM 0 points [-]

That doesn't seem right. Defecting causes the opponent to defect next time. It's a bad idea with any decision theory.

Reason backwards from the inevitable end of the iteration. Defecting makes sense there, so defecting one turn earlier makes sense, so one turn earlier...

Comment author: DanielLC 25 May 2012 07:17:56PM 0 points [-]

That depends on if it's known what the last iteration will be.

Also, I think any deviation from CDT in common knowledge (such as if you're not sure that they're sure that you're sure that they're a perfect CDT) would result in defecting a finite, and small, number of iterations from the end.

Comment author: Manfred 25 May 2012 08:54:54AM 0 points [-]

That doesn't seem right. Defecting causes the opponent to defect next time. It's a bad idea with any decision theory.

Fair enough. I guess I had some special case stuff in mind - there are certainly ways to get a CDT agent to cooperate on prisoner's dilemma ish problems.

Comment author: drnickbone 25 May 2012 11:21:12AM 3 points [-]

A more correct analysis is that CDT defects against itself in iterated Prisoner's Dilemma, provided there is any finite bound to the number of iterations. So two CDTs in charge of nuclear weapons would reason "Hmm, the sun's going to go Red Giant at some point, and even if we escape that, there's still that Heat Death to worry about. Looks like an upper bound to me". And then they'd immediately nuke each other.

A CDT playing against a "RevengeBot" - if you nuke it, it nukes back with an all out strike - would never fire its weapons. But then the RevengeBot could just take out one city at a time, without fear of retaliation.

Since CDT was the "gold standard" of rationality developed during the time of the Cold War, I am somewhat puzzled why we're still here.

Comment author: Manfred 25 May 2012 11:30:47AM 2 points [-]

Well, it's good that you're puzzled, because it wasn't - see Schelling's "The Strategy of Conflict."

Comment author: drnickbone 25 May 2012 11:52:25AM 0 points [-]

I get the point that a CDT would pre-commit to retaliation if it had time (i.e. self-modify into a RevengeBot).

The more interesting question is why it bothers to do that re-wiring when it is expecting the nukes from the other side any second now...

Comment author: wedrifid 26 May 2012 02:31:27AM 1 point [-]

So two CDTs in charge of nuclear weapons would reason "Hmm, the sun's going to go Red Giant at some point, and even if we escape that, there's still that Heat Death to worry about. Looks like an upper bound to me". And then they'd immediately nuke each other.

This assumes that the mutual possession of nuclear weapons constitutes a prisoners dilemma. There isn't necessarily a positive payoff to nuking folks. (You know, unless they are really jerks!)

Comment author: drnickbone 26 May 2012 06:57:12AM 1 point [-]

Well nuking the other side eliminates the chance that they'll ever nuke you (or will attack with conventional weapons), so there is arguably a slight positive for nuking first as opposed to keeping the peace.

There were some very serious thinkers arguing for a first strike against the Soviet Union immediately after WW2, including (on some readings) Bertrand Russell, who later became a leader of CND. And a pure CDT (with selfish utility) would have done so. I don't see how Schelling theory could have modified that... just push the other guy over the cliff before the ankle-chains get fastened.

Probably the reason it didn't happen was the rather obvious "we don't want to go down in history as even worse than the Nazis" - also there was complacency about how far behind the Soviets actually were. If it had been known that they would explode an A-bomb as little as 4 years after the war, then the calculation would have been different. (Last ditch talks to ban nuclear weapons completely and verifiably - by thorough spying on each other - or bombs away. More likely bombs away I think.)

Comment author: [deleted] 29 May 2012 09:37:18AM *  1 point [-]

It will defect on all prisoners dilemmas, even if they're iterated. So, for example, if we'd left it in charge of our nuclear arsenal during the cold war, it would have launched missiles as fast as possible.

I don't think MAD is a prisoner dilemma: in the prisoner dilemma, if I know you're going to cooperate no matter what, I'm better off defecting, and if I know you're going to defect no matter what, I'm better off defecting. This doesn't seem to be the case here: bombing you doesn't make me better off all things being equal, it just makes you worse off. If anything, it's a game of Chicken where bombing the opponent corresponds to going straight and not bombing them corresponds to swerving. And CDTists don't always go straight in Chicken, do they?

Comment author: Manfred 29 May 2012 11:19:15AM 0 points [-]

Hm, I disagree - if nuking the Great Enemy never made you any better off, why was anyone ever afraid of anyone getting nuked in the first place? It might not grow your crops for you or buy you a TV, but gains in security and world power are probably enough incentive to at least make people worry.

Comment author: [deleted] 29 May 2012 11:24:08AM *  1 point [-]

Still better modelled by Chicken (where the utility of winning is assumed to be much smaller than the negative of the utility of dying, but still non-zero) than by PD.

(edited to add a link)

Comment author: Manfred 30 May 2012 05:00:37AM 0 points [-]

I don't understand what you mean by "modeled better by chicken" here.

Comment author: Nornagest 30 May 2012 05:48:16AM *  1 point [-]

I expect army1987's talking about Chicken, the game of machismo in which participants rush headlong at each other in cars or other fast-moving dangerous objects and whoever swerves first loses. The payoff matrix doesn't resemble the Prisoner's Dilemma all that much: there's more than one Nash equilibrium, and by far the worst outcome from either player's perspective occurs when both players play the move analogous to defection (i.e. don't swerve). It's probably most interesting as a vehicle for examining precommitment tactics.

The game-theoretic version of Chicken has often been applied to MAD, as the Wikipedia page mentions.

Comment author: [deleted] 30 May 2012 10:22:06AM 0 points [-]

I was. I should have linked to it, and I have now.

Comment author: [deleted] 28 May 2012 09:14:09AM 1 point [-]

I think TDT reduces to CDT if there's no other agent with similar or greater intelligence than you around. (You also mustn't have any dynamical inconsistency such as akrasia, otherwise your future and past selves count as ‘other’ as well.) So I don't think it'd make much of a difference for a singleton -- but I'd rather use an RDT just in case.

Comment author: wedrifid 28 May 2012 02:27:21PM 1 point [-]

I think TDT reduces to CDT if there's no other agent with similar or greater intelligence than you around.

It isn't the absolute level of intelligence that is required, but rather that the other agent is capable of making a specific kind of reasoning. Even this can be relaxed to things that can only dubiously be said to qualify as being classed "agent". The requirement is that some aspect of the environment has (utility-relevant) behavior that is entangled with the output of the decision to be made in a way that is other than a forward in time causal influence. This almost always implies that some agent is involved but that need not necessarily be the case.

Caveat: Maybe TDT is dumber than I remember and artificially limits itself in a way that is relevant here. I'm more comfortable making assertions about what a correct decision theory would do than about what some specific attempt to specify a decision theory would do.

but I'd rather use an RDT just in case.

You make me happy! RDT!

Comment author: lackofcheese 24 May 2012 07:51:17AM *  2 points [-]

Let's say that TDT agents can be divided into two categories, TDT-A and TDT-B, based on a single random bit added to their source code in advance. Then TDT-A can take the strategy of always picking the first box in Problem 2, and TDT-B can always pick the second box.

Now, if you're a TDT agent being offered the problem; with the aforementioned strategy, there's a 50% chance that the simulated agent is different than you, netting you $1 million. This also narrows down the advantage of the CDT agent - now they only have a 50% chance of winning the money, which is equal to yours.

Comment author: drnickbone 24 May 2012 12:23:59PM *  0 points [-]

That's not too bad, actually. One of my ideas while thrashing about here was that an agent should have a "favourite" number in the set {1, 2} and pick that number with certainty. That way, Omega will definitely put the $1 million in Box 1 or Box 2 and each agent will have 50% chance that their favourite number disagrees with the simulated agent's.

This won't work if Omega describes the source-code of the simulation (or otherwise reveals the simulation's favourite number) - since then any agent with that exact code knows it can't choose deterministically, and its best chance is to pick each box with equal chance, as described in the original analysis.

Comment author: Manfred 24 May 2012 07:31:23PM 3 points [-]

Actually, the way the problem is specified, Omega puts the money in box 3.

Comment author: drnickbone 24 May 2012 07:58:11PM 0 points [-]

The argument is that the simulation is either TDT-A in this case, or TDT-B. Either way, the simulated agent will pick a single favourite box (1 or 2) with certainty, so the money is in either Box 2 or Box 1,

Though I can see an interpretation which leads to Box 3. Omega simulates a "new-born" TDT (which is neither -A nor -B) and watches as it differentiates itself to one variant or the other, each with equal probability. So the new-born picks boxes 1 and 2 with equal frequency over multiple simulations, and Box 3 contains the money. Is that what you were thinking?

Comment author: Manfred 24 May 2012 08:00:53PM *  0 points [-]

Is that what you were thinking?

Yes. I was thinking that Omega would have access to the agent's source code, and be running the "play against yourself, if you pick a different number than yourself you win" game. Omega is a jerk :D

Comment author: lackofcheese 24 May 2012 08:17:25PM *  2 points [-]

If it's your own exact source being simulated, then it's probably impossible to do better than 10%, and the problem isn't interesting anymore.

Comment author: Bill_McGrath 24 May 2012 11:05:06AM 0 points [-]

Any agent who is themselves running TDT will reason as in the standard Newcomb problem.

Will they? Surely it's clear that it's now possible to take $1,001,00, because the circumstances are slightly different.

In the standard Newcomb problem, where Omega predicts your behaviour, it's not possible to trick it or act other than its expectation. Here, it is.

Is there some basic part of decision theory I'm not accounting for here?

Comment author: falenas108 24 May 2012 12:45:28PM *  0 points [-]

Yes. If the TDT agent picked the $1,001,00 here, then the simulated agent would have two-boxed as well, meaning only box A would be filled.

Remember, the simulated agent was presented with the same problem, so the decision TDT makes here is the same one the simulated agent makes.

Comment author: Bill_McGrath 24 May 2012 01:08:23PM 1 point [-]

Right, I understand what you mean. I was thinking of in the context of a person being presented with this situation, not an idealized agent running a specific decision theory.

And Omega's simulated agent would presumably hold all the same information as a person would, and be capable of responding the same way.

Cheers for clarifying that for me.

Comment author: selylindi 24 May 2012 03:25:39PM 9 points [-]

Problem 2 reminds me strongly of playing GOPS.

For those who aren't familiar with it, here's a description of the game. Each player receives a complete suit of standard playing cards, ranked Ace low through King high. Another complete suit, the diamonds, is shuffled (or not, if you want a game of complete information) and put face down on the table; these diamonds have point values Ace=1 through King=13. In each trick, one diamond is flipped face-up. Each player then chooses one card from their own hand to bid for the face-up diamonds, and all bids are revealed simultaneously. Whoever bids highest wins the face-up diamonds, but if there is a tie for the highest bid (even when other players did not tie), then no one wins them and they remain on the table to be won along with the next trick. All bids are discarded after every trick.

Especially when the King comes up early, you can see everyone looking at each other trying to figure out how many levels deep to evaluate "What will the other players do?".

(1) Play my King to be likely to win. (2) Everyone else is likely to do (1) also, which will waste their Kings. So instead play low while they throw away their Kings. (3) If the players are paying attention, they might all realize they should (2), in which case I should play highest low card - the Queen. (4+) The 4th+ levels could repeat (2) and (3) mutatis mutandis until every card has been the optimal choice at some level. In practice, players immediately recognize the futility of that line of thought and instead shift to the question: How far down the chain of reasoning are the other players likely to go? And that tends to depend on knowing the people involved and the social context of the game.

Maybe playing GOPS should be added to the repertoire of difficult decision theory puzzles alongside the prisoner's dilemma, Newcomb's problem, Pascal's mugging, and the rest of that whole intriguing panoply. We've had a Prisoner's Dilemma competition here before - would anyone like to host a GOPS competition?

Comment author: shokwave 25 May 2012 08:19:52AM 0 points [-]

I'm going to play this game at LW meetups in future. Hopefully some insights will arise out of it.

I also think I might try to generalise this kind of problem, in the vein of trolley problems being a generalisation of some types of decisions and Parfit's Hitchhiker being a generalisation of precommittment-favouring situations.

Comment author: [deleted] 25 May 2012 08:14:16PM *  0 points [-]

For problem 1, in the language of the blackmail posts, because the tactic omega uses to fill box 2,

TDT-sim.box1,box2=(<F,T> <T,T>) -> Omega.box2=(1M, 0)

depends on TDT-sim's decision, because Omega has already decided, and because Omega didn't make its decision known, a TDT agent presented with this problem is at an epistemic disadvantage relative to Omega: TDT can't react to Omega's actual decision, because it won't know Omega's actual decision until it knows it's own actual decision, at which point TDT can't further react. This epistemic disadvantage doesn't need to be enforced temporally; even if TDT knows Omega's source code, if TDT has limited simulation resources, it might not practically be able to compute Omega's actual decision any way but via Omega's dependence on TDT's decision.

any other agent who is not running TDT ... will be able to re-construct the chain of logic and reason that the simulation one-boxed and so box B contains the $1 million

There aren't other ways for an agent to be at an epistemic disadvantage relative to Omega in this problem than by being TDT? Could you construct an agent which was itself disadvantaged relative to TDT?

Comment author: dlthomas 25 May 2012 08:33:32PM 3 points [-]

Could you construct an agent which was itself disadvantaged relative to TDT?

"Take only the box with $1000."

Which itself is inferior to "Take no box."

Comment author: private_messaging 25 May 2012 10:04:12PM *  -2 points [-]

There was this Rocko thing a while back (which is not supposed to be discussed), where if I understood that nonsense correctly, the idea was that the decision theories here would do equivalent to one-boxing on Newcomb with transparent boxes where you could see there is no million, when there's no million. (and where the boxes were made and sealed before you were born). It's not easy to one-box rationally.

Also in practice usually being simulated correctly is awesome for getting scammed (agents tend to face adversaries rather than crazed beneficiaries).

Comment author: wedrifid 26 May 2012 02:52:16AM 0 points [-]

Problem 1: Omega (who experience has shown is always truthful) presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT. I won't tell you what the agent decided, but I will tell you that if the agent two-boxed then I put nothing in Box B, whereas if the agent one-boxed then I put $1 million in Box B. Regardless of how the simulated agent decided, I put $1000 in Box A. Now please choose your box or boxes."

This is indeed a problem - and one I would describe as the general class "dealing with other agents who are fucking with you." It is not one that can be solved and I believe a "correct" decision theory will, in fact, lose (compared to CDT) in this case.

Note that there seems to be some chance that I am confused in a way analogous to the way that people who believe "Two boxing on Newcomb's is rational" are confused. There could be a deep insight I am missing. This seems comparatively unlikely.

Comment author: [deleted] 28 May 2012 09:08:53AM 6 points [-]

Omega (who experience has shown is always truthful) presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT.

If he's always truthful, then he didn't lie to the simulation either and this means that he did infinitely many simulations before that. So assume he says "Either before you entered the room I ran a simulation of this problem as presented to an agent running TDT, or you are such a simulation yourself and I'm going to present this problem to the real you afterwards", or something similar. If he says different things to you and to your simulation instead, then it's not obvious you'll give the same answer.

Are these really "fair" problems? Is there some intelligible sense in which they are not fair, but Newcomb's problem is fair?

Well, a TDT agent has indexical uncertainty about whether or not they're in the simulation, whereas a CDT or EDT agent doesn't. But I haven't thought this through yet, so it might turn out to be irrelevant.

Comment author: DanArmak 28 May 2012 03:01:53PM *  0 points [-]

He can't have done literally infinitely many simulations. If that is really required it would be a way out by saying the thought experiment stipulates an impossible situation. I haven't yet considered whether the problem can be changed to give the same result and not require infinitely many simulations.

ETA: no wait, that can't be right, because it would apply to the original Newcomb's problem too. So there must be a way to formalize this correctly. I'll have to look it up but don't have the time right now.

Comment author: [deleted] 28 May 2012 04:03:14PM 1 point [-]

In the original Newcomb's problem it's not specified that Omega performs simulations -- for all we know, he might use magic, closed timelike curves, or quantum magic whereby Box A is in a superposition of states entangled with your mind whereby if you open Box B, A ends up being empty and if you hand B back to Omega, A ends up being full.

Comment author: DanArmak 28 May 2012 04:26:18PM 0 points [-]

We should take this seriously: a problem that cannot be instantiated in the physical world should not affect our choice of decision theory.

Before I dig myself in deeper, what does existing wisdom say? What is a practical possible way of implementing Newcomb's problem? For instance, simulation is eminently practical as long as Omega knows enough about the agent being simulated. OTOH, macro quantum enganglement of an arbitrary agent's arbitrary physical instantiation with a box prepared by Omega doesn't sound practical to me, but maybe I'm just swayed by increduilty. What do the experts say? (Including you if you're an expert, obviously.)

Comment author: [deleted] 28 May 2012 04:37:15PM *  -1 points [-]

cannot

0 is not a probability, and even tiny probabilities can give rise to Pascal's mugging.

Unless your utility function is bounded.

Comment author: wedrifid 28 May 2012 04:58:12PM 1 point [-]

0 is not a probability, and even tiny probabilities can give rise to Pascal's mugging.

Even? I'd go as far as to say only. Non-tiny probabilities aren't Pascal's muggings. They are just expected utility calculations. </lighthearted nitpick!>

Comment author: DanArmak 28 May 2012 05:02:37PM 0 points [-]

If a problem statement has an internal logical contradiction, there is still a tiny probability that I and everyone else are getting it wrong, due to corrupted hardware or a common misconception about logic or pure chance, and the problem can still be instantiated. But it's so small that I shouldn't give it preferential consideration over other things I might be wrong about, like the nonexistence of a punishing god or that the food I'm served at the restaraunt today is poisoned.

Either of those if true could trump any other (actual) considerations in my actual utility function. The first would make me obey religious strictures to get to heaven. The second threatens death if I eat the food. But I ignore both due to symmetry in the first case (the way to defeat Pascal's wager in general) and to trusting my estimation of the probability of the danger in the second (ordinary expected utility reasoning).

AFAICS both apply to considering an apparently self-contradictory problem statement as really not possible with effective probability zero. I might be misunderstanding things so much that it really is possible, but I might also be misunderstanding things so much that the book I read yesterday about the history of Africa really contained a fascinating new decision theory I must adopt or be doomed by Omega.

All this seems to me to fail due to standard reasoning about Pascal's mugging. What am I missing?

Comment author: [deleted] 28 May 2012 06:16:50PM 0 points [-]

If a problem statement has an internal logical contradiction

AFAIK Newcomb's dilemma does not logically contradict itself, it just contradict the physical law that causality cannot go backwards in time.

Comment author: wedrifid 28 May 2012 06:23:57PM *  1 point [-]

AFAIK Newcomb's dilemma does not logically contradict itself, it just contradict the physical law that causality cannot go backwards in time.

It certainly doesn't contradict itself, and I would also assert that it doesn't contradict the physical law that causality cannot go backwards in time. Instead I would say that giving the sane answer to Newcomb's problem requires abanding the assumption that one's decision must be based only on what it affects based on forward in time causal, physical influence.

Comment author: private_messaging 28 May 2012 07:46:14PM *  0 points [-]

Consider making both boxes transparent to illustrate some related issue.

Comment author: drnickbone 28 May 2012 06:57:02PM 1 point [-]

This question of "Does Omega lie to sims?" was already discussed earlier in the thread. There were several possible answers from cousin_it and myself, any of which will do.

Comment author: private_messaging 28 May 2012 09:10:14PM *  0 points [-]

So assume he says "Either before you entered the room I ran a simulation of this problem as presented to an agent running TDT, or you are such a simulation yourself and I'm going to present this problem to the real you afterwards", or something similar.

...

Well, a TDT agent has indexical uncertainty about whether or not they're in the simulation, whereas a CDT or EDT agent doesn't.

Say, you have CDT agent in the world, affecting the world via set of robotic hands, robotic voice, and so on. If you wire up two robot bodies to 1 computer (in parallel so that all movements are done by both bodies), that is just somewhat peculiar robotic manipulator. Handling this doesn't require any changes to CDT.

Likewise when you have two robot bodies controlled by identical mathematical equation, provided that your world model in the CDT utility calculation accounts for all the known manipulators which are controlled by the chosen action, you get correct result.

Likewise, you can have CDT control a multitude of robots, either from one computer, or from multiple computers that independently determine optimal, identical actions (but each computer only act on a robot body assigned to that computer)

The CDT is formally defined using mathematics; the mathematics is already 'timeless', and the fact that the chosen action affects the contents of the boxes is a part of world model not decision theory (and so is the physical time and physical causality a part of world model not the decision theory. Even though the decision theory is called causal, that's some other 'causal').

Comment author: loup-vaillant 31 May 2012 08:49:30AM *  0 points [-]

Either problem 1 and 2 are hitting an infinite regress issue, or I don't see why an ordinary TDT agent wouldn't 2box, and choose the first box, respectively. There's a difference between the following problems:

  • I, Omega, predicted that you would do such and such, and acted accordingly.
  • I, Omega, simulated another agent, and acted accordingly.
  • I, Omega, simulated this very problem, only if you don't run TDT that's not the same problem, but I promise it's the same nonetheless, and acted accordingly

Now, in problem 1 and 2, are the simulated problem and the actual problem actually the same? If they are, I see an infinite regress at Omega's side, and therefore not a problem one would ever encounter. If they aren't, then what I actually understand them to be is:

  1. Omega presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of Newcomb's problem as presented to an agent running TDT. If the agent two-boxed then I put nothing in Box B, whereas if the agent one-boxed then I put $1 million in Box B. Regardless of how the simulated agent decided, I put $1000 in Box A. Now please choose your box or boxes."

    Really, You don't have to use something else than TDT to see that the simulated TDT agent one boxed. Its problem isn't your problem. Your precomittment to your problem doesn't affect your precommitment to its problem. Of course, the simulated TDT agent did the right choice by 1 boxing. But you should 2 box.

  2. Omega now presents ten boxes, numbered from 1 to 10, and announces the following. "I ran multiple simulation of the following problem, presented to a TDT agent: “You must take exactly one box. I determined which box you are least likely to take, and put $1million in that box. If there is a tie, I put the money in one of them (the one labelled with the lowest number).” I put the money in the box the simulated TDT agent were least likely to choose. If there was a tie, I put the money in one of them (the one labelled with the lowest number). Now choose your box."

    Same here. You know that the TDT agent put equal probability on every box, to maximize its gains. Again, its problem isn't your problem. Your precomittment to your problem doesn't affect your precommitment to its problem. Of course, the simulated TDT agent did the right choice by choosing at random. But you should take box 1.

Comment author: private_messaging 31 May 2012 11:28:35AM *  2 points [-]

I think we need a 'non-problematic problems for CDT' thread.

For example, it is not problematic for CDT-based robot controller to have the control values in the action A represent multiple servos in it's world model, as if you wired multiple robot arms to 1 controller in parallel. You may want to do this if you want the robot arms move in unison and pass along the balls in the real world imitation of http://blueballmachine2.ytmnd.com/

It is likewise not problematic if you ran out of wire and decided to make the '1 controller' be physically 2 controllers running identical code from above, or if you ran out of time machines and decided to control yesterday's servo with 1 controller yesterday, and today's servo with same controller in same state today. It's simply low level, irrelevant details.

Mathematical formalization of CDT (such as robot software) will one-box or two-box in newcomb depending to the world model within which CDT decides. If the world model has the 'prediction' as second servo represented by same variable, then it'll one-box.

Philosophical maxims like "act based on consequences of my actions", whenever they one box, or two box, depend in turn solely on philosophical questions like "what is self" . E.g. if "self" means the physical meat, then two-box, if "self" means the algorithm (a higher level concept), then one-box if you assume that the thing in predictor is "self" too.

edit: another thing. Stuff outside robot's senses is naturally uncertain. Upon hearing of the explanation in Newcomb's paradox, one has to update the estimates of what is outside the senses; outside might be that the money are fake, and there's some external logic and wiring and servos that will put real million into a box if you choose to 1-box. If the money are to pay for, I dunno, your child's education, clearly one got to 1-box. I'm pretty sure Causal Deciding General Thud can 1-box just fine, if he needs the money to buy the real weapons for the real army, and suspects that outside his senses there may be the predictor spying. General Thud knows that the best option is to 1-box inside predictor and 2-box outside. The goal is never to two box outside the predictor.

Comment author: Stuart_Armstrong 01 June 2012 11:36:34AM 3 points [-]

Intuitively this doesn't feel like a 'fair' problem. A UDT agent would ace the TDT formulation and vice versa. Any TDT agent that found a way of distinguishing between 'themselves' and Omega's TDT agent would also ace the problem. It feels like an acausal version of something like:

"I get agents A and B to choose one or two boxes. I then determine the contents of the boxes based on my best guess of A's choice. Surprisingly, B succeeds much better than A at this."

Still an intriguing problem, though.