# Problematic Problems for TDT

A key goal of Less Wrong's "advanced" decision theories (like TDT, UDT and ADT) is that they should out-perform standard decision theories (such as CDT) in contexts where another agent has access to the decider's code, or can otherwise predict the decider's behaviour. In particular, agents who run these theories will one-box on Newcomb's problem, and so generally make more money than agents which two-box. Slightly surprisingly, they may well continue to one-box even if the boxes are transparent, and even if the predictor Omega makes occasional errors (a problem due to Gary Drescher, which Eliezer has described as equivalent to "counterfactual mugging"). More generally, these agents behave like a CDT agent will wish it had pre-committed itself to behaving before being faced with the problem.

However, I've recently thought of a class of Omega problems where TDT (and related theories) appears to under-perform compared to CDT. Importantly, these are problems which are "fair" - at least as fair as the original Newcomb problem - because the reward is a function of the agent's actual choices in the problem (namely which box or boxes get picked) and independent of the method that the agent uses to choose, or of its choices on any other problems. This contrasts with clearly "unfair" problems like the following:

**Discrimination**: Omega presents the usual two boxes. Box A always contains $1000. Box B contains nothing if Omega detects that the agent is running TDT; otherwise it contains $1 million.

So what are some *fair* "problematic problems"?

**Problem 1**: Omega (who experience has shown is always truthful) presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT. I won't tell you what the agent decided, but I will tell you that if the agent two-boxed then I put nothing in Box B, whereas if the agent one-boxed then I put $1 million in Box B. Regardless of how the simulated agent decided, I put $1000 in Box A. Now please choose your box or boxes."

** Analysis**: Any agent who is themselves running TDT will reason as in the standard Newcomb problem. They'll prove that their decision is linked to the simulated agent's, so that if they two-box they'll only win $1000, whereas if they one-box they will win $1 million. So the agent will choose to one-box and win $1 million.

However, any CDT agent can just take both boxes and win $1001000. In fact, any other agent who is *not* running TDT (e.g. an EDT agent) will be able to re-construct the chain of logic and reason that the simulation one-boxed and so box B contains the $1 million. So any other agent can safely two-box as well.

Note that we can modify the contents of Box A so that it contains anything up to $1 million; the CDT agent (or EDT agent) can in principle win up to twice as much as the TDT agent.

**Problem 2**: Our ever-reliable Omega now presents ten boxes, numbered from 1 to 10, and announces the following. "Exactly one of these boxes contains $1 million; the others contain nothing. You must take exactly one box to win the money; if you try to take more than one, then you won't be allowed to keep any winnings. Before you entered the room, I ran multiple simulations of this problem as presented to an agent running TDT, and determined the box which the agent was least likely to take. If there were several such boxes tied for equal-lowest probability, then I just selected one of them, the one labelled with the smallest number. I then placed $1 million in the selected box. Please choose your box."

** Analysis**: A TDT agent will reason that whatever it does, it cannot have more than 10% chance of winning the $1 million. In fact, the TDT agent's best reply is to pick each box with equal probability; after Omega calculates this, it will place the $1 million under box number 1 and the TDT agent has exactly 10% chance of winning it.

But any non-TDT agent (e.g. CDT or EDT) can reason this through as well, and just pick box number 1, so winning $1 million. By increasing the number of boxes, we can ensure that TDT has arbitrarily low chance of winning, compared to CDT which always wins.

**Some questions:**

1. Have these or similar problems already been discovered by TDT (or UDT) theorists, and if so, is there a known solution? I had a search on Less Wrong but couldn't find anything obviously like them.

2. Is the analysis correct, or is there some subtle reason why a TDT (or UDT) agent would choose differently from described?

3. If a TDT agent believed (or had reason to believe) that Omega was going to present it with such problems, then wouldn't it want to self-modify to CDT? But this seems paradoxical, since the whole idea of a TDT agent is that it doesn't have to self-modify.

4. Might such problems show that there cannot be a single TDT algorithm (or family of provably-linked TDT algorithms) so that when Omega says it is simulating a TDT agent, it is quite ambiguous what it is doing? (This objection would go away if Omega revealed the source-code of its simulated agent, and the source-code of the choosing agent; each particular version of TDT would then be out-performed on a specific matching problem.)

5. Are these really "fair" problems? Is there some intelligible sense in which they are not fair, but Newcomb's problem is fair? It certainly looks like Omega may be "rewarding irrationality" (i.e. giving greater gains to someone who runs an inferior decision theory), but that's exactly the argument that CDT theorists use about Newcomb.

6. Finally, is it more likely that Omegas - or things like them - will present agents with Newcomb and Prisoner's Dilemma problems (on which TDT succeeds) rather than problematic problems (on which it fails)?

**Edit:** I tweaked the explanation of Box A's contents in Problem 1, since this was causing some confusion. The idea is that, as in the usual Newcomb problem, Box A always contains $1000. Note that Box B depends on what the simulated agent chooses; it doesn't depend on Omega predicting what the actual deciding agent chooses (so Omega doesn't put less money in any box just because it sees that the actual decider is running TDT).

## Comments (297)

BestI think we could generalise problem 2 to be problematic for any decision theory XDT:

There are 10 boxes, numbered 1 to 10. You may only take one. Omega has (several times) run a simulated XDT agent on this problem. It then put a prize in the box which it determined was least likely to be taken by such an agent - or, in the case of a tie, in the box with the lowest index.

If agent X follows XDT, it has at best a 10% chance of winning. Any sufficiently resourceful YDT agent, however, could run a simulated XDT agent themselves, and figure out what Omega's choice was without getting into an infinite loop.

Therefore, YDT performs better than XDT on this problem.

If I'm right, we may have shown the impossibility of a "best' decision theory, no matter how meta you get (in a close analogy to Godelian incompleteness). If I'm wrong, what have I missed?

*9 points [-]You're right about problem 2 being a fully general counterargument, but your philosophical conclusion seems to be stopping too early. For example, can we define a class of "fair" problems that excludes problem 2?

One possible place to look is that we're allowing Omega access not just to a particular simulated decision of TDT, but to the probabilities with which it makes these decisions. If we force it to simulate TDT many times and sample to learn what the probabilities are, it can't detect the exact balance for which it does deterministic symmetry breaking, and the problem goes away.

This solution occurred to me because this forces Omega to have something like a continuous behaviour response to changes in the probabilities of different TDT outputs, and it seems possible given that to imagine a proof that a fixed point must exist.

Fair point - how does Omega tell when the sim's choosing probabilities are exactly equal? Well I was thinking that Omega could prove they are equal (by analysing the simulation's behaviour, and checking where it calls on random bits). Or if it can't do that, then it can just check that the choice frequencies are "statistically equal" (i.e. no significant differences after a billion runs, say) and treat them as equal for the tie-breaker rule. The "statistically equal" approach might give the TDT agent a very slightly higher than 10% chance of winning the money, though I haven't analysed this in any detail.

If the subject can know the exact code of TDT, Omega can know the exact code of TDT, and analyse it however it likes. That means it can know exactly where randomness is invoked - why would it have to sample?

This was my first thought: Omega can just prove the choosing probabilities are equal. However, it's not totally straightforward, because the sim could sample more random bits depending on the results of its first random bits, and so on, leading to an exponentially growing outcome tree of possibilities, with no upper size bound to the length of the tree. There might not be an easy proof of equality in that case. Sampling and statistical equality is the next best approach...

It looks like the issue here is that while Omega is ostensibly not taking into account your decision theory, it implicitly is by simulating an XDT agent. So a first patch would be to define simulations of a specific decision theory (as opposed to simulations of a given agent) as "unfair".

On the other hand, we can't necessarily know if a given computation is effectively equivalent to simulating a given decision theory. Even if the string "TDT" is never encoded anywhere in Omega's super-neurons, it might still be simulating a TDT agent, for example.

On the first hand again, it might be easy for most problems to figure out whether anyone is implicitly favouring one DT over another, and thus whether they're "fair".

*3 points [-]I would say that any such problem doesn't show that there is no best decision theory, it shows that that class of problem cannot be used in the ranking.

Edited to add: Unless, perhaps, one can show that an instantiation of the problem with particular choice of (in this case decision theory, but whatever is varied) is particularly likely to be encountered.

*1 point [-]To draw out the analogy to Godelian incompleteness, any computable decision theory is subject to the suggested attack of being given a "Godel problem'' like problem 1, just as any computable set of axioms for arithmetic has a Godel sentence. You can always make a new decision theory TDT' that is TDT+ do the right thing for the Godel problem. But TDT' has it's own Godel problem of course. You can't make a computable theory that says "do the right thing for all Godel probems", if you try to do that it would not give you something computable. I'm sure this is all just restating what you had in mind, but I think it's worth spelling out.

If you have some sort of oracle for the halting problem (i.e. a hypercomputer) and Omega doesn't, he couldn't simulate you, so you would presumably be able to always win fair problems. Otherwise the best thing you could hope for is to get the right answer whenever your computation halts, but fail to halt in your computation for some problems, such as your Godel problem. (A decision theory like this can still be given a Godel problem if Omega can solve the halting problem, "I simulated you and if you fail to halt on this problem..."). I wonder if TDT fails to halt for its Godel problem, or if some natural modification of it might have this property, but I don't understand it well enough to guess.

I am less optimistic about revising "fair" to exclude Godel problems. The analogy would be proving Peano arithmetic is complete "except for things that are like Godel sentences." I don't know of any formalizations of the idea of "being a Godel sentence".

*15 points [-]Consider

Problem 3: Omega presents you with two boxes, one of which contains $100, and says that it just ran a simulation ofyouin the present situation and put the money in the box the simulation didn't choose.This is a standard diagonal construction, where the environment is set up so that you are punished for the actions you choose, and rewarded for those of don't choose, irrespective of the actions. This doesn't depend on the decision algorithm you're implementing. A possible escape strategy is to make yourself unpredictable to the environment. The difficulty would also go away if the thing being predicted wasn't you, but something else you could predict as well (like a different agent that doesn't simulate you).

The correct solution to this problem is to choose each box with equal probability; this problem is the reason why decision theories have to be non-deterministic. It comes up all the time in real life: I try and guess what safe combination you chose, try that combination, and if it works I take all your money. Or I try to guess what escape route you'll use and post all the guards there.

What's interesting about Problem 2 is that it makes what would be the normal game-theoretic strategy unstable by choosing deterministically where the probabilities are exactly equal.

Of course, you can just set up the thought experiment with the proviso that "be unpredictable" is not a possible move - in fact that's the whole point of Omega in these sorts of problems. If Omega's trying to break into your safe, he takes your money. In Nesov's problem, if you can't make yourself unpredictable, then you win

nothing- it's not even worth your time to open the box. In both cases, a TDT agent does strictly as well as it possibly could - the fact that there's $100 somewhere in the vicinity doesn't change that.I think it's right to say that these aren't really "fair" problems, but they are unfair in a very interesting new way that Eliezer's definition of fairness doesn't cover, and it's not at all clear that it's possible to come up with a nice new definition that avoids this class of problem. They remind me of "Lucas cannot consistently assert this sentence".

Problem 2 reminds me strongly of playing GOPS.

For those who aren't familiar with it, here's a description of the game. Each player receives a complete suit of standard playing cards, ranked Ace low through King high. Another complete suit, the diamonds, is shuffled (or not, if you want a game of complete information) and put face down on the table; these diamonds have point values Ace=1 through King=13. In each trick, one diamond is flipped face-up. Each player then chooses one card from their own hand to bid for the face-up diamonds, and all bids are revealed simultaneously. Whoever bids highest wins the face-up diamonds, but if there is a tie for the highest bid (even when other players did not tie), then no one wins them and they remain on the table to be won along with the next trick. All bids are discarded after every trick.

Especially when the King comes up early, you can see everyone looking at each other trying to figure out how many levels deep to evaluate "What will the other players do?".

(1) Play my King to be likely to win. (2) Everyone else is likely to do (1) also, which will waste their Kings. So instead play low while they throw away their Kings. (3) If the players are paying attention, they might all realize they should (2), in which case I should play highest low card - the Queen. (4+) The 4th+ levels could repeat (2) and (3) mutatis mutandis until every card has been the optimal choice at some level. In practice, players immediately recognize the futility of that line of thought and instead shift to the question: How far down the chain of reasoning are the other players likely to go? And that tends to depend on knowing the people involved and the social context of the game.

Maybe playing GOPS should be added to the repertoire of difficult decision theory puzzles alongside the prisoner's dilemma, Newcomb's problem, Pascal's mugging, and the rest of that whole intriguing panoply. We've had a Prisoner's Dilemma competition here before - would anyone like to host a GOPS competition?

I'm going to play this game at LW meetups in future. Hopefully some insights will arise out of it.

I also think I might try to generalise this kind of problem, in the vein of trolley problems being a generalisation of some types of decisions and Parfit's Hitchhiker being a generalisation of precommittment-favouring situations.

The problems look like a kind of an anti-Prisoner's Dilemma. An agent plays against an opponent, and gets a reward iff they played differently. Then any agent playing against itself is screwed.

*28 points [-]You can construct a "counterexample" to any decision theory by writing a scenario in which it (or the decision theory you want to have win) is named explicitly. For example, consider Alphabetic Decision Theory, which writes a description of each of the options, then chooses whichever is first alphabetically. ADT is bad, but not so bad that you can't make it win: you could postulate an Omega which checks to see whether you're ADT, gives you $1000 if you are, and tortures you for a year if you aren't.

That's what's happening in Problem 1, except that it's a little bit hidden. There, you have an Omega which says: if you are TDT, I will make the content of these boxes depend on your choice in such a way that you can't have both; if you aren't TDT, I filled both boxes.

You can see that something funny has hapened by postulating TDT-prime, which is identical to TDT except that Omega doesn't recognize it as a duplicate (eg, it differs in some way that should be irrelevant). TDT-prime would two-box, and win.

*20 points [-]Right, but this is exactly the insight of this post put another way. The possibility of an Omega that rewards eg ADT is discussed in Eliezer's TDT paper. He sets out an idea of a "fair" test, which evaluates only what you do and what you are predicted to do, not what you are. What's interesting about this is that this is a "fair" test by that definition, yet it acts like an unfair test.

Because it's a fair test, it doesn't matter whether Omega thinks TDT and TDT-prime are the same - what matters is whether TDT-prime thinks so.

Two questions: First, how does is this distinction justified? What a decision theory

isis a strategy for responding to decision tasks and simulating agents performing the right decision tasks tells you what kind of decision theory they're using. Why does it matter if it's done implicitly (as in Newcomb's discrimination against CDT) or explicitly. And second why should we care about it? Why is it important for a decision theory to pass fair tests but not unfair tests?Well, on unfair tests a decision theory still needs to do as well as possible. If we had a version of the original Newcomb's problem, with the one difference that a CDT agent gets $1billion just for showing up, it's still incumbent upon a TDT agent to walk away with $1000000 rather than $1000. The "unfair" class of problems is that class where "winning as much as possible" is distinct from "winning the most out of all possible agents".

Real-world unfair tests could matter, though it's not clear if there are any. However, hypothetical unfair tests aren't very informative about what is a good decision theory, because it's trivial to cook one up that favours one theory and disfavours another. I think the hope was to invent a decision theory that does well on all fair tests; the example above seems to show that may not be possible.

*3 points [-]No, not even by Eliezer's standard, because TDT is not given the same problem than other decision theories.

As stated in comments below, everyone but TDT have the information "I'm not in the simulation" (or more precisely, in one of the simulations of the infinite regress that is implied by Omega's formulation). The reason TDT does not have this extra piece of information comes from the fact that it is TDT, not from any decision it may make.

Right, and this is an unfairness that Eliezer's definition fails to capture.

At this point, I need the text of that definition.

The definition is in Eliezer's TDT paper although a quick grep for "fair" didn't immediately find the definition.

*0 points [-]This variation of the problem was invented in the follow-up post (I think it was called "Sneaky strategies for TDT" or something like that:

Omega tells you that earlier he flipped a coin. If the coin came down heads, it simulated a CDT agent facing this problem. If the coin came down tails, it simulated a TDT agent facing this problem. In either case, if the simulated agent one-boxed, there is $1000000 in Box-B; if it two-boxed Box-B is empty. In this case TDT still one-boxes (50% chance of $1000000 dominates a 100% chance of $1000), and CDT still two-boxes (because that's what CDT does). In this case, even though both agents have an equal chance of being simulated, CDT out-performs TDT (average payoffs of 500500 vs. 500000) - CDT takes advantage of TDT's prudence and TDT suffers for CDT's lack of it. Notice also that TDT cannot do better by behaving like CDT (both would get payoffs of 1000). This shows that the class of problems we're concerned with is not so much "fair" vs. "unfair", but more like "those problem on which the best

Ican do is not necessarily the best anyone can do". We can call it "fairness" if we want, but it's not like Omega is discriminating against TDT in this case.*3 points [-]This is not a zero-sum game. CDT does not outperform TDT here. It just makes a stupid mistake, and happens to pay it less dearly than TDT

Let's say Omega submit the same problem to 2 arbitrary decision theories. Each will either 1-box or 2-box. Here is the average payoff matrix:

Clearly, 1 boxing still dominates 2-boxing. Whatever the other does, you personally get about half a million more by 1-boxing. TDT may have less utility than CDT for 1-boxing, but CDT is still stupid here, while TDT is not.

Not exactly. Because the problem statement says that it simulates "TDT", if you were to expand the problem statement out into code it would have to contain source code to a complete instantiation of TDT. When the problem statement is run, TDT or TDT-prime can look at that instantiation and compare it to its own source code. TDT will see that they're the same, but TDT-prime will notice that they are different, and thereby infer that it is not the simulated copy. (Any difference whatsoever is proof of this.)

Consider an alternative problem. Omega flips a coin, and asks you to guess what it was, with a prize if you guess correctly. If the coin was heads, he shows you a piece of paper with TDT's source code. If the coin was tails, he shows you a piece of paper with your source code, whatever that is.

*11 points [-]I'm not sure the part about comparing source code is correct. TDT isn't supposed to search for exact copies of itself, it's supposed to search for parts of the world that are logically equivalent to itself.

The key thing is the question as to whether it could have been you that has been simulated. If all you know is that you're a TDT agent and what Omega simulated is a TDT agent, then it could have been you. Therefore you have to act as if your decision now may either real or simulated. If you know you are not what Omega simulated (for any reason), then you know that you only have to worry about the 'real' decision.

Suppose that Omega doesn't reveal the full source code of the simulated TDT agent, but just reveals enough logical facts about the simulated TDT agent to imply that it uses TDT. Then the "real" TDT Prime agent cannot deduce that it is different.

*0 points [-]Yes. I think that as long as there is any chance of you being the simulated agent, then you need to one box. So you one box if Omega tells you 'I simulated some agent', and one box if Omega tells you 'I simulated an agent that uses the same decision procedure as you', but two box if Omega tells you 'I simulated an agent that had a different copywrite comment in its source code to the comment in your source code'.

This is just a variant of the 'detect if I'm in a simulation' function that others mention. i.e. if Omega gives you access to that information in any way, you can two box. Of course, I'm a bit stuck on what Omega has told the simulation in that case. Has Omega done an infinite regress?

That's an interesting way to look at the problem. Thanks!

Indeed. These are all scenarios of the form "Omega looks at the source code for your decision theory, and intentionally creates a scenario that breaks it." Omega could do this with any possible decision theory (or at last, anything that could be implemented with finite resources), so what exactly are we supposed to learn by contemplating specific examples?

It seems to me that the valuable Omega thought experiments are the ones where Omega's omnipotence is simply used to force the player to stick to the rules of the given scenario. When you start postulating that an impossible, acausal superintelligence is actively working agaisnt you it's time to hang up your hat and go home, because no strategy you could possibly come up with is going to do you any good.

The trouble is when another agent wins in this situation

andin the situations you usually encounter. For example, an anti-traditional-rationalist, that always makes the opposite choice to a traditional rationalist, will one-box; it just fails spectacularly when asked to choose between different amounts of cake.I don't think so. If TDT-prime two boxes, the TDT simulation two-boxes, so only one box is full, so TDT-prime walks away with $1000. Omega doesn't check what decision theory you're using at all - it just simulates TDT and bases its decision on that. I do think that this ought to fall outside a rigorously defined class of "fair" problems, but it doesn't matter whether Omega can recognise you as a TDT-agent or not.

No, if TDT-prime two boxes, the TDT simulation still one-boxes.

Hmm, so TDT-prime would reason something like, "The TDT simulation will one-box because, not knowing that it's the simulation, but also knowing that the simulation will use exactly the same decision theory as itself, it will conclude that the simulation will do the same thing as itself and so one-boxing is the best option. However, I'm

differentto the TDT-simulation, and therefore I can safely two-box without affecting its decision." In which case, does it matter how inconsequential the difference is? Yep, I'm confused.I also had thoughts along these lines - variants of TDT could logically separate themselves, so that T-0 one-boxes when it is simulated, but T-1 has proven that T-0 will one-box, and hence T-1 two-boxes when T-0 is the sim.

But a couple of difficulties arise. The first is that if TDT variants can logically separate from each other (i.e. can prove that their decisions aren't linked) then they won't co-operate with each other in Prisoner's Dilemma. We could end up with a bunch of CliqueBots that only co-operate with their exact clones, which is not ideal.

The second difficulty is that for each specific TDT variant, one with algorithm T' say, there will be a specific problematic problem on which T' will do worse than CDT (and indeed worse than all the other variants of TDT) - this is the problem with T' being the exact algorithm running in the sim. So we still don't get the - desirable - property that there is some sensible decision theory called TDT that is optimal across fair problems.

The best suggestion I've heard so far is that we try to adjust the definition of "fairness", so that these problematic problems also count as "unfair". I'm open to proposals on that one...

I think this is avoidable. Let's say that there are two TDT programs called Alice and Bob, which are exactly identical except that Alice's source code contains a comment identifying it as Alice, whereas Bob's source code contains a comment identifying it as Bob. Each of them can read their own source code. Suppose that in problem 1, Omega reveals that the source code it used to run the simulation was Alice. Alice has to one-box. But Bob faces a different situation than Alice does, because he can find a difference between his own source code and the one Omega simulated, whereas Alice could not. So Bob can two-box without effecting what Alice would do.

However, if Alice and Bob play the prisoner's dilemma against each other, the situation is much closer to symmetric. Alice faces a player identical to itself except with the "Alice" comment replaced with "Bob", and Bob faces a player identical to itself except with the "Bob" comment replaced with "Alice". Hopefully, their algorithm would compress this information down to "The other player is identical to me, but has a comment difference in its source code", at which point each player would be in an identical situation.

You might want to look at my follow-up article which discusses a strategy like this (among others). It's worth noting that slight variations of the problem remove the opportunity for such "sneaky" strategies.

Ah, thanks. I had missed that, somehow.

*0 points [-]In a prisoners dilemma Alice and Bob affect each others outcomes. In the newcomb problem, Alice affects Bobs outcome, but Bob doesn't affect Alices outcome. That's why it's OK for Bob to consider himself different in the second case as long as he knows he is definitely not Alice (because otherwise he might actually be in a simulation) but not OK for him to consider himself different in the prisoners dilemma.

*0 points [-]Well, I've had a think about it, and I've concluded that it

wouldmatter how great the difference between TDT and TDT-prime is. If TDT-prime is almost the same as TDT, but has an extra stage in its algorithm in which it converts all dollar amounts to yen, it should still be able to prove that it is isomorphic to Omega's simulation, and therefore will not be able to take advantage of "logical separation".But if TDT-prime is different in a way that makes it non-isomorphic, i.e. it sometimes gives a different output given the same inputs, that may still not be enough to "separate" them. If TDT-prime acts the same as TDT, except when there is a walrus in the vicinity, in which case it tries to train the walrus to fight crime, it is still the case in this walrus-free problem that it makes exactly the same choice as the simulation (?). It's as if you need the ability to prove that two agents necessarily give the same output for the particular problem you're faced with, without proving what output those agents actually give, and

thatsure looks crazy-hard.EDIT: I mean crazy-hard for the general case, but much, much easier for all the cases where the two agents are actually the same.

EDIT 2: On the subject of fairness, my first thoughts: A fair problem is one in which if you had arrived at your decision by a coin flip (which is as transparently predictable as your actual decision process - i.e. Omega can predict whether it's going to come down heads or tails with perfect accuracy), you would be rewarded or punished no more or less than you would be using your actual decision algorithm (and this applies to every available option).

EDIT 3: Sorry to go on like this, but I've just realised that won't work in situations where some other agent bases their decision on whether you're predicting what their decision will be, i.e. Prisoner's Dilemma.

The right place to introduce the separation is not in between TDT and TDT-prime, but in between TDT-prime's output and TDT-prime's decision. If its output is a

strategy, rather than a number of boxes, then that strategy can include a byte-by-byte comparison; and if TDT and TDT-prime both do it that way, then they both win as much as possible.*6 points [-]Thanks for the post! Your problems look a little similar to Wei's 2TDT-1CDT, but much simpler. Not sure about the other decision theory folks, but I'm quite puzzled by these problems and don't see any good answer yet.

*1 point [-]I've looked a bit at that thread, and the related follow-ups, and my head is now really spinning. You are correct that my problems were simpler!

My immediate best guess on 2TDT-1CDT is that the human player would do better to submit a simple defect-bot (rather than either CDT or TDT), and this is irrespective of whether the player themselves is running TDT or CDT. If the player has to submit his/her own decision algorithm (source-code) instead of a bot, then we get into a colossal tangle about "who defects first", "whose decision is logically prior to whose" and whether the TDT agents will threaten to defect if they detect that the submitted agent may defect, or has already self-modified into unconditionally defecting, or if the TDT agents will just defect unconditionally anyway to even the score (e.g. through some form of utility trading / long term consequentialism principle that TDT has to beat CDT in the long run, therefore it had better just get on and beat CDT wherever possible...)

In short, I observe I am confused.

With all this logical priority vs temporal priority, and long term consequences feeding into short-term utilities, I'm reminded of the following from HPMOR Chapter 61:

Thanks for this, and for the reference. I'll have a look at 2TDT-1CDT to see if there are any insights there which could resolve these problems. I've got a couple of ideas myself, but will check up on the other work.

Here's another similar problem; see also the solution.

My sense is that question 6 is a better question to ask than 5. That is, what's important isn't drawing some theoretical distinction between fair and unfair problems, but finding out what problems we and/or our agents will actually face. To the extent that we are ignorant of this now but may know more in the future when we are smarter and more powerful, it argues for not fixing a formal decision theory to determine our future decisions, but instead making sure that we and/or our agents can continue to reason about decision theory the same way we currently can (i.e., via philosophy).

If he's always truthful, then he didn't lie to the simulation either and this means that he did infinitely many simulations before that. So assume he says "Either before you entered the room I ran a simulation of this problem as presented to an agent running TDT, or you are such a simulation yourself and I'm going to present this problem to the real you afterwards", or something similar. If he says different things to you and to your simulation instead, then it's not obvious you'll give the same answer.

Well, a TDT agent has indexical uncertainty about whether or not they're in the simulation, whereas a CDT or EDT agent doesn't. But I haven't thought this through yet, so it might turn out to be irrelevant.

This question of "Does Omega lie to sims?" was already discussed earlier in the thread. There were several possible answers from cousin_it and myself, any of which will do.

*0 points [-]He can't have done literally infinitely many simulations. If that is really required it would be a way out by saying the thought experiment stipulates an impossible situation. I haven't yet considered whether the problem can be changed to give the same result and not require infinitely many simulations.

ETA: no wait, that can't be right, because it would apply to the original Newcomb's problem too. So there must be a way to formalize this correctly. I'll have to look it up but don't have the time right now.

In the original Newcomb's problem it's not specified that Omega performs simulations -- for all we know, he might use magic, closed timelike curves, or quantum magic whereby Box A is in a superposition of states entangled with your mind whereby if you open Box B, A ends up being empty and if you hand B back to Omega, A ends up being full.

We should take this seriously: a problem that cannot be instantiated in the physical world should not affect our choice of decision theory.

Before I dig myself in deeper, what does existing wisdom say? What is a practical possible way of implementing Newcomb's problem? For instance, simulation is eminently practical as long as Omega knows enough about the agent being simulated. OTOH, macro quantum enganglement of an arbitrary agent's arbitrary physical instantiation with a box prepared by Omega doesn't sound practical to me, but maybe I'm just swayed by increduilty. What do the experts say? (Including you if you're an expert, obviously.)

*0 points [-]...

Say, you have CDT agent in the world, affecting the world via set of robotic hands, robotic voice, and so on. If you wire up two robot bodies to 1 computer (in parallel so that all movements are done by both bodies), that is just somewhat peculiar robotic manipulator. Handling this doesn't require any changes to CDT.

Likewise when you have two robot bodies controlled by identical mathematical equation, provided that your world model in the CDT utility calculation accounts for all the known manipulators which are controlled by the chosen action, you get correct result.

Likewise, you can have CDT control a multitude of robots, either from one computer, or from multiple computers that independently determine optimal, identical actions (but each computer only act on a robot body assigned to that computer)

The CDT is formally defined using mathematics; the mathematics is already 'timeless', and the fact that the chosen action affects the contents of the boxes is a part of world model not decision theory (and so is the physical time and physical causality a part of world model not the decision theory. Even though the decision theory is called causal, that's some other 'causal').

Can someone answer the following: Say someone implemented an AGI using CDT. What exactly would go wrong that a better decision theory would fix?

It will defect on all prisoners dilemmas, even if they're iterated. So, for example, if we'd left it in charge of our nuclear arsenal during the cold war, it would have launched missiles as fast as possible.

But I think the main motivation was that, when given the option to self-modify, a CDT agent will self-modify as a method of precommittment - CDT isn't "reflectively consistent." And so if you want to predict an AI's behavior, if you predict based on CDT with no self-modification you'll get it wrong, since it doesn't stay CDT. Instead, you should try to find out what the AI wants to self-modify to, and predict based on that.

A more correct analysis is that CDT defects against

itselfin iterated Prisoner's Dilemma, provided there is any finite bound to the number of iterations. So two CDTs in charge of nuclear weapons would reason "Hmm, the sun's going to go Red Giant at some point, and even if we escape that, there's still that Heat Death to worry about. Looks like an upper bound to me". And then they'd immediately nuke each other.A CDT playing against a "RevengeBot" - if you nuke it, it nukes back with an all out strike - would never fire its weapons. But then the RevengeBot could just take out one city at a time, without fear of retaliation.

Since CDT was the "gold standard" of rationality developed during the time of the Cold War, I am somewhat puzzled why we're still here.

Well, it's good that you're puzzled, because it wasn't - see Schelling's "The Strategy of Conflict."

I get the point that a CDT would pre-commit to retaliation if it had time (i.e. self-modify into a RevengeBot).

The more interesting question is why it bothers to do that re-wiring when it is expecting the nukes from the other side any second now...

This assumes that the mutual possession of nuclear weapons constitutes a prisoners dilemma. There isn't necessarily a positive payoff to nuking folks. (You know, unless they are really jerks!)

Well nuking the other side eliminates the chance that they'll ever nuke you (or will attack with conventional weapons), so there is arguably a slight positive for nuking first as opposed to keeping the peace.

There were some very serious thinkers arguing for a first strike against the Soviet Union immediately after WW2, including (on some readings) Bertrand Russell, who later became a leader of CND. And a pure CDT (with selfish utility) would have done so. I don't see how Schelling theory could have modified that... just push the other guy over the cliff before the ankle-chains get fastened.

Probably the reason it didn't happen was the rather obvious "we don't want to go down in history as even worse than the Nazis" - also there was complacency about how far behind the Soviets actually were. If it had been known that they would explode an A-bomb as little as 4 years after the war, then the calculation would have been different. (Last ditch talks to ban nuclear weapons completely and verifiably - by thorough spying on each other - or bombs away. More likely bombs away I think.)

*1 point [-]I don't think MAD is a prisoner dilemma: in the prisoner dilemma, if I know you're going to cooperate no matter what, I'm better off defecting, and if I know you're going to defect no matter what, I'm better off defecting. This doesn't seem to be the case here: bombing you

doesn'tmake me better off all things being equal, it just makes you worse off. If anything, it's a game of Chicken where bombing the opponent corresponds to going straight and not bombing them corresponds to swerving. And CDTists don't always go straight in Chicken, do they?Hm, I disagree - if nuking the Great Enemy never made you any better off, why was anyone ever afraid of anyone getting nuked in the first place? It might not grow your crops for you or buy you a TV, but gains in security and world power are probably enough incentive to at least make people worry.

*1 point [-]Still better modelled by Chicken (where the utility of winning is assumed to be much smaller than the negative of the utility of dying, but still non-zero) than by PD.

(edited to add a link)

I don't understand what you mean by "modeled better by chicken" here.

*1 point [-]I expect army1987's talking about Chicken, the game of machismo in which participants rush headlong at each other in cars or other fast-moving dangerous objects and whoever swerves first loses. The payoff matrix doesn't resemble the Prisoner's Dilemma all that much: there's more than one Nash equilibrium, and by far the worst outcome from either player's perspective occurs when both players play the move analogous to defection (i.e. don't swerve). It's probably most interesting as a vehicle for examining precommitment tactics.

The game-theoretic version of Chicken

hasoften been applied to MAD, as the Wikipedia page mentions.I was. I should have linked to it, and I have now.

That doesn't seem right. Defecting causes the opponent to defect next time. It's a bad idea with any decision theory.

It won't self-modify to TDT. It will self-modify to something similar, but using its beliefs at the time of modification as the priors. For example, it will use the doomsday argument immediately to find out how long the world is likely to last, and it will use that information from then on, rather than redoing it as its future self (getting a different answer).

Fair enough. I guess I had some special case stuff in mind - there are certainly ways to get a CDT agent to cooperate on prisoner's dilemma ish problems.

Reason backwards from the inevitable end of the iteration. Defecting makes sense there, so defecting one turn earlier makes sense, so one turn earlier...

That depends on if it's known what the last iteration will be.

Also, I think any deviation from CDT in common knowledge (such as if you're not sure that they're sure that you're sure that they're a perfect CDT) would result in defecting a finite, and small, number of iterations from the end.

Ah, that second paragraph makes perfect sense. Thanks.

I think TDT reduces to CDT if there's no other agent with similar or greater intelligence than you around. (You also mustn't have any dynamical inconsistency such as akrasia, otherwise your future and past selves count as ‘other’ as well.) So I don't think it'd make much of a difference for a singleton -- but I'd rather use an RDT just in case.

It isn't the absolute level of intelligence that is required, but rather that the other agent is capable of making a specific kind of reasoning. Even this can be relaxed to things that can only dubiously be said to qualify as being classed "agent". The requirement is that some aspect of the environment has (utility-relevant) behavior that is entangled with the output of the decision to be made in a way that is other than a forward in time causal influence. This almost always implies that some agent is involved but that need not necessarily be the case.

Caveat: Maybe TDT is dumber than I remember and artificially limits itself in a way that is relevant here. I'm more comfortable making assertions about what a correct decision theory would do than about what some specific attempt to specify a decision theory would do.

You make me happy! RDT!

There's a different version of these problems for each decision theory, depending on what Omega simulates. For CDT, all agents two-box and all agents get $1000. However, on problem 2, it seems like CDT doesn't have a well-defined decision at all; the effort to work out what Omega's simulator will say won't terminate.

(I'm spamming this post with comments - sorry!)

*2 points [-]You raise an interesting question here - what would CDT do if a CDT agent were in the simulation?

It looks to me that CDT just doesn't have the conceptual machinery to handle this problem properly, so I don't really know. One thing that could happen is that the simulated CDT agent tries to simulate itself and gets stuck in an infinite loop. I didn't specify exactly what would happen in that case, but if Omega can prove that the simulated agent is caught in a loop, then it knows the sim will choose each box with probability zero, and so (since these are all equal), it will fill box 1. But now can a real-life CDT agent also work this out, and beat the game by selecting box 1. But if so, why won't the sim do that, and so on? Aargh !!!

Another thought I had is that CDT could try tossing a logical coin, like computing the googleth digit of pi, and if it is even choose box 1, whereas if it is odd, choose box 2. If it runs out of time before computing (which the real-life agent will do), then it just picks box 1 or 2 with equal probability. The simulated CDT agent will however get to the end of the computation (Omega has arbitrary computational resources) and definitely pick 1 or 2 with certainty, so the money is definitely in one of those two boxes, which looks like the probability of the actual agent winning is raised to 50%. TDT might do the same.

However this looks like cheating to me, for both CDT and TDT.

EDIT: On reflection, it seems clear that CDT would never do anything "creatively sneaky" like tossing a logical coin; but it is the sort of approach that TDT (or some variant thereof) might come up with. Though I still think it's cheating.

I don't think your "detect infinite resources and cheat" strategy is really worth thinking about. Instead of strategies like CDT and TDT whose applicability to limited compute resources is unclear, suppose you have an anytime strategy X, which you can halt at any time and get a decision. Then there's really a family of algorithms X-t, where t is the time you're going to give it to run. In this case, if you are X-t, we can consider the situation where Omega fields X-t against you.

*1 point [-]The version of CDT that I described explicitly should arrive at the uniformly random solution. You don't have to be able to simulate a program all the way through, just able to prove things about its output.

EDIT:Wait, this is wrong. It won't be able to consistently derive an answer, because of the way it acts given such an answer, and so it will go with whatever its default Nash equilibrium is.Re: your EDIT. Yes, I've had that sort of reaction a couple of times today!

I'm shifting around between "CDT should pick at random, no CDT should pick Box 1, no CDT should use a logical coin, no CDT should pick it's favourite number in the set {1, 2} with probability 1, and hope that the version in the sim has a different favourite number, no, CDT will just go into a loop or collapse in a heap."

I'm also quite clueless how a TDT is supposed to decide if it's told there's a CDT in the sim... This looks like a pretty evil decision problem in its own right.

Well, the thing is that CDT doesn't

completelyspecify a decision theory. I'm confident now that the specific version of CDT that I described would fail to deduce anything and go with its default, but it's hard to speak for CDTs in general on such a self-referential problem.BTW, general question about decision theory. There appears to have been an academic study of decision theory for over a century, and causal and evidential decision theory were set out in 1981. Newcomb's paradox was set out in 1969. Yet it seems as though no-one thought to explore the space beyond these two decision theories until Eliezer proposed TDT, and it seems as if there is a 100% disconnect between the community exploring new theories (which is centered around LW) and the academic decision theory community. This seems really, really odd - what's going on?

This is simply not true. Robert Nozick (who introduced Newcomb's problem to philosophers) compared/contrasted EDT and CDT at least as far back as 1993. Even back then, he noted their inadequacy on several decision-theoretic problems and proposed some alternatives.

Me being ignorant of something seemed like a likely part of the explanation - thanks :) I take it you're referencing "The Nature of Rationality"? Not read that I'm afraid. If you can spare the time I'd be interested to know what he proposes -thanks!

*6 points [-]I haven't read

The Nature of Rationalityin quite a long time, so I won't be of much help. For a very simple and short introduction to Nozick's work on decision theory, you should read this (PDF).There were plenty of previous theories trying to go beyond CDT or EDT, they just weren't satisfactory.

*5 points [-]This paper talks about reflexive decision models and claims to develop a form of CDT which one boxes.

It's in my to-read list but I haven't got to it yet so I'm not sure whether it's of interest but I'm posting it just in case (it could be a while until I have time to read it so I won't be able to post a more informed comment any time soon).

Though this theory post-dates TDT and so isn't interesting from

thatperspective.*2 points [-]Dispositional decision theory :P

... which I cannot find a link to the paper for, now. Hm. But basically it was just TDT, with less awareness of why.

EDIT: Ah, here it was. Credit to Tim Tyler.

I checked it. Not the same thing.

It should be noted that Newcomb's problem was considered interesting in Philosophy in 1969, but decision theories were studied more in other fields - so there's a disconnect between the sorts of people who usually study formal decision theories and that sort of problem.

*0 points [-](Deleting comments seems not to be working. Consider this a manual delete.)

Decision Theory is and can be applied to a variety of problems here. It's just that AI may face Newcomb-like problems and in particular we want to ensure a 1-boxing-like behavior on the part of AI.

*3 points [-]The rationale for TDT-like decision theories is even more general, I think. There's no guarantee that our world contains only one copy of something. We want a decision theory that would let the AI cooperate with its copies or logical correlates, rather than wage pointless wars.

*2 points [-]Constructing rigorous mathematical foundation of decision theory to explain what a decision problem or a decision or a goal are, is potentially more useful than resolving any given informally specified class of decision problems.

What is an example of such a real-world problem?

Negotiations with entities who can read the AI's source code.

Given the week+ delay in this response, it's probably not going to see much traffic, but I'm not convinced "reading" source code is all that helpful. Omega is posited to have nearly god-like abilities in this regard, but since this is a rationalist discussion, we probably have to rule out actual omnipotence.

If Omega intends to simply run the AI on spare hardware it has, then it has to be prepared to validate (in finite time and memory) that the AI hasn't so obfuscated its source as to be unintelligible to rational minds. It's also possible that the source to an AI is rather simple but it is dependent a large amount of input data in the form of a vast sea of numbers. I.e., the AI in question could be encoded as an ODE system integrator that's reliant on a massive array of parameters to get from one state to the next. I don't see why we should expect Omega to be better at picking out the relevant, predictive parts of these numbers than we are.

If the AI can hide things in its code or data, then it can hide functionality that tests to determine if it is being run by Omega or on its own protected hardware. In such a case it can lie to Omega just as easily as Omega can lie to the "simulated" version of the AI.

I think it's time we stopped positing an omniscient Omega in these complications to Newcomb's problem. They're like epicycles on Ptolemaic orbital theory in that they continue a dead end line of reasoning. It's better to recognize that Newcomb's problem is a red herring. Newcomb's problem doesn't demonstrate problems that we should expect AI's to solve in the real world. It doesn't tease out meaningful differences between decision theories.

That is, what decisions on real-world problems do we expect to be different between two AIs that come to different conclusions about Newcomb-like problems?

*2 points [-]You should note that every problem you list is a special case. Obviously, there are ways of cheating at Newcomb's problem if you're aware of salient details beforehand. You could simply allow a piece of plutonium to decay, and do whatever the resulting Geiger counter noise tells you to. That does not, however, support your thesis that Newcomb's problem is a totally artificial problem with no logical intrusions into reality.

As a real-world example, imagine an off-the-shelf stock market optimizing AI. Not sapient, to make things simpler, but smart. When any given copy begins running, there are already hundreds or thousands of near-identical copies running elsewhere in the market. If it fails to predict their actions from its own, it will do objectively worse than it might otherwise do.

i don't see how your example is apt or salient. My thesis is that Newcomb-like problems are the wrong place to be testing decision theories because they do not represent realistic or relevant problems. We should focus on formalizing and implementing decision theories and throw real-world problems at them rather than testing them on arcane logic puzzles.

Well... no, actually. A good decision theory ought to be universal. It ought to be correct, and it ought to work. Newcomb's problem is important, not because it's ever likely to happen, but because it shows a case in which the normal, commonly accepted approach to decision theory (CDT) failed miserably. This 'arcane logic puzzle' is illustrative of a deeper underlying flaw in the model, which needs to be addressed. It's also a flaw that'd be much harder to pick out by throwing 'real world' problems at it over and over again.

Seems unlikely to work out to me. Humans evolved intelligence without Newcomb-like problems. As the only example of intelligence that we know of, it's clearly possible to develop intelligence without Newcomb-like problems. Furthermore, the general theory seems to be that AIs will start dumber than humans and iteratively improve until they're smarter. Given that, why are we so interested in problems like these (which humans don't universally agree about the answers to)?

I'd rather AIs be able to help us with problems like "what should we do about the economy?" or even "what should I have for dinner?" instead of worrying about what we should do in the face of something godlike.

Additionally, human minds aren't universal (assuming that universal means that they give the "right" solutions to all problems), so why should we expect AIs to be? We certainly shouldn't expect this if we plan on iteratively improving our AIs.

*9 points [-]The more I think about it, the more interesting these problems get! Problem 1 seems to re-introduce all the issues that CDT has on Newcomb's Problem, but for TDT. I first thought to introduce the ability to 'break' with past selves, but that doesn't actually help with the simulation problem.

It did lead to a cute observation, though. Given that TDT cares about all sufficiently accurate simulations of itself,

it's actually winning.It doesn't seem very relevant, but I think if we explored Richard's point that we need to actually formalise this, we'd find that any simulation high-fidelity enough to actually bind a TDT agent to its previous actions would necessarily give the agent the utility from the simulations, and vice versa, any simulation not accurate enough to give utility would be sufficiently different from TDT to allow our agent to two-box when that agent one-boxed.

Omega doesn't need to simulate the agent actually getting the reward. After the agent has made its choice, the simulation can just end.

Omega is supposed to be always truthful, so either he rewards the sims as well, or you know something the sims don't and hence it's not obvious you'll do the same as them.

I thought Omega was allowed to lie to sims.

Even if he's not, after he's given a $1m simulated reward, does he then have to keep up a simulated environment for the sim to actually spend the money?

If he can lie to sims, then you can't know he's not lying to you unless you know you're not a sim. If you do, it's not obvious you'd choose the same way as if you didn't.

For instance, if you think Omega is lying and completely ignore everything he says, you obviously two-box.

Why not zero-box in this case? I mean, what reason would I have to expect any money at all?

Well, as long as you believe Omega enough to think no box contains sudden death or otherwise negative utility, you'd open them to see what was inside. But yes, you might not believe Omega at all.

General question: suppose we encounter an alien. We have no idea what its motivations, values, goals, or abilities are. On the other hand, if may have observed any amount of human comm traffic from wireless EM signals since the invention of radio, and from actual spy-probes before the human invention of high tech that would detect them.

It signals us in Morse code from its remote starship, offering mutually benefitial trade.

What prior should we have about the alien's intention? Should we use a native uniform prior that would tell us it's as likely to mean us good as harm, and so never reply because we don't know how it will try to influence our actions via communications? Should it tell us different agents who don't explicitly value one another will conflict to the extent their values differ, and so since value-space is vast and a randomly selected alien is unlikely to share many values with us, we should prepare for war? Should it tell us we can make some assumptions (which?) about naturally evolved agents or their Friendly-to-themselves creations? How safe are we if we try to "just read" English text written by an unknown, possibly-superintelligence which may have observed all our broadcast traffic since the age of radio? What does our non-detection of this alien civ until they chose to initiate contact tell us? Etc.

A 50% chance of meaning us good vs harm isn't a prior I find terribly compelling.

There's a lot to say here, but my short answer is that this is both an incredibly dangerous and incredibly valuable situation, in which both the potential opportunity costs and the potential actual costs are literally astronomical, and in which there are very few things I can legitimately be confident of.

The best I can do in such a situation is to accept that my best guess is overwhelmingly likely to be wrong, but that it's slightly

lesslikely to be wrong than my second-best guess, so I should operate on the basis of my best guessdespiteexpecting it to be wrong. Where "best guess" here is the thing I consider most likely to be true,notthe thing with the highest expected value.I should also note that my priors about aliens in general -- that is, what I consider likely about a randomly selected alien intelligence -- are less relevant to this scenario than what I consider likely about

this particularintelligence, given that it has observed us for long enough to learn our language, revealed itself to us, communicated with us in Morse code, offered mutually beneficial trade, etc.The most tempting belief for me is that the alien's intentions are essentially similar to ours. I can even construct a plausible sounding argument for that as my best guess... we're the only

otherspecies I know capable of communicating the desire for mutually beneficial trade in an artificial signalling system, so our behavior constitutes strong evidence for their behavior. OTOH, it's pretty clear to me that thereasonI'm tempted to believe that is because I candosomething with that belief; it gives me a lot of traction for thinking about what to do next. (In a nutshell, I would conclude from that assumption that it means to exploit us for its long-term benefit, and whether that's good or bad for us depends entirely on what our most valuable-to-it resources are and how it can most easily obtain them and whether we benefit from that process.) Since that has almost nothing to do with thelikelihoodof it being true, I should distrust my desire to believe that.Ultimately, I think what I do is reply that I value mutually beneficial trade with them, but that I don't actually trust them and must therefore treat them as a potential threat until I have gathered more information about them, while at the same time refraining from doing anything that would significantly reduce our chances of engaging in mutually beneficial trade in the future, and what do they think about all that?

He can certainly give them counterfactual 'realities'. It would seem that he should be assumed to at least provide counterfactual realities wherein information provided by the simulation's representation of Omega indicates that he is perfectly trustworthy.

No. But if for whatever reason the simulated environment persists it should be one that is consistent with Omega keeping his word. Or, if part of the specification of the problem or the declarations made by Omega directly pertain to claims about what He will do regarding simulation then he will implement that policy.

If we are assuming that Omega is trustworthy, then Omega needs to be assumed to be trustworthy in the simulation too. If they didn't allow the simulated version of the agent to enjoy the fruits of their choice, then they would not be trustworthy.

*0 points [-]Actually, I'm not sure this matters. If the simulated agent knows he's not getting a reward, he'd still want to choose so that the nonsimulated version of himself gets the best reward.

So the problem is that the best answer is unavailable to the simulated agent: in the simulation you should one box and in the 'real' problem you'd like to two box, but you have no way of knowing whether you're in the simulation or the real problem.

Agents that Omega didn't simulate don't have the problem of worrying whether they're making the decision in a simulation or not, so two boxing is the correct answer for them.

The decisions being made are very different between an agent that has to make the decision twice and the first decision will affect the payoff of the second versus an agent that has to make the decision only once, so I think that in reality perhaps the problem does collapse down to an 'unfair' one because the TDT agent is presented with an essentially different problem to a nonTDT agent.

Corollary: Omega can statically analyse the TDT agent's decision algorithm.

This needs some serious mathematics underneath it. Omega is supposed to run a simulation of how an agent of a certain sort handled a certain problem, the result of that simulation being a part of the problem itself. I don't think it's possible to tell, just from these English words, that there is a solution to this fixed-point formulation. And TDT itself hasn't been formalised, although I assume there are people (Eliezer? Marcello? Wei Dai?) working on that.

Cf. the construction of Gödel sentences: you can't just assume that a proof-system can talk about itself, you have to explicitly construct a way for it to talk about itself and show precisely what "talking about itself" means, before you can do all the cool stuff about undecidable sentences, Löb's theorem, and so on.

This seems well-specified to me: Since the agent is not told its own output in advance, it is possible to run the "simulation" and the "real version" in finite time. If you hand me a computer program that is the agent, I will hand you a computer program that is Omega and the environment.

*1 point [-]Here is a useful program:

if (being_simulated()) pick_box_b(); else pick_both_boxes();

being_simulated() is an interesting function; it would presumably consist of things like comparing star maps across as broad a range of EM frequencies as possible with historical knowledge, the positions of planetary bodies, Earthly satellites, quasar signatures, and inputs of other individuals and networks. It could also be as simple as "if (get_current_average_cosmic_ray_bit_flips() != historical_bit_flips_average) return true;" Since my brain tries to play Omega with me every night (well, on the nights I get good sleep) I think we all have some practical experience with being_simulated(). While our brains are nowhere near as capable as Omega they are fairly good at suspending our own disbelief during dreams . What generally breaks disbelief is trying to behave rationally, including observing/testing the environment which in reality would be the outside world but in a dream is simulated. Ultimately Omega is forced to simulate a very large subset of the Universe in order to cause being_simulated() to return false, or to alter the program itself so that being_simulated vacuously returns true.

To prevent the latter attack the program should execute a fully homomorphically encrypted sub-function and base all its decisions on the outputs of that function. In other words its true decision theory should be hidden even from the hardware running it. This pushes Omega's capabilities into the cryptographic realm and the ability to solve arbitrarily hard cryptography problems. Once this is done, Omega becomes an oracle for solving any hard problem, e.g. by running this function as an input to the final decision theory:

if (strange_paradoxical_decision_is_required()) { bank_secret = factor_omegas_bank_account_public_key(); return sign_and_cash_check_for_one_billion_dollars(bank_secret); }

This restricts Omega to putting a time limit (or computation limit) on its simulations to avoid being suckered by its own computing power. Presumably Omega also puts a time limit on the real time a being can spend thinking before choosing which boxes it wants, but presumably that time is long enough for some fairly serious philosophizing and likely more than enough time for a reliable being_simulated() call.

Another approach is to consider how much information an intelligent being can gather about its environment. A human would have trouble determining whether two boxes are empty or full, but additional sensors could detect the presence of money in the boxes fairly reliably. What matters for Omega is making sure that a being cannot determine the contents of the boxes before picking them. From the perspective of a rational being this is equivalent to the boxes being filled with cash after making a decision. If Omega has the capability to obscure the contents of boxes then Omega certainly has the ability to obscure the placement of money into the boxes as they are chosen (just a glorified magic trick). Given that interpretation, CDT will one-box.

EDIT: I apologize for the formatting, I am not very good at escaping/formatting apparently.

*0 points [-]This strategy is discussed in the follow-up article.

In general it's difficult, because by assumption Omega has the computational power to simulate more or less anything (including an environment matching the world as you remember it; this might be like the real world, or you might have spent your whole life so far as a sim). And the usual environment for these problems is a sealed room, so that you can't look at the stars etc.

But TDT already has this problem - TDT is all about finding a fixed point decision.

Intuitively this doesn't feel like a 'fair' problem. A UDT agent would ace the TDT formulation and vice versa. Any TDT agent that found a way of distinguishing between 'themselves' and Omega's TDT agent would also ace the problem. It feels like an acausal version of something like:

"I get agents A and B to choose one or two boxes. I then determine the contents of the boxes based on my best guess of A's choice. Surprisingly, B succeeds much better than A at this."

Still an intriguing problem, though.

*2 points [-]I think we need a 'non-problematic problems for CDT' thread.

For example, it is not problematic for CDT-based robot controller to have the control values in the action A represent multiple servos in it's world model, as if you wired multiple robot arms to 1 controller in parallel. You may want to do this if you want the robot arms move in unison and pass along the balls in the real world imitation of http://blueballmachine2.ytmnd.com/

It is likewise not problematic if you ran out of wire and decided to make the '1 controller' be physically 2 controllers running identical code from above, or if you ran out of time machines and decided to control yesterday's servo with 1 controller yesterday, and today's servo with same controller in same state today. It's simply low level, irrelevant details.

Mathematical formalization of CDT (such as robot software) will one-box or two-box in newcomb depending to the world model within which CDT decides. If the world model has the 'prediction' as second servo represented by same variable, then it'll one-box.

Philosophical maxims like "act based on consequences of my actions", whenever they one box, or two box, depend in turn solely on philosophical questions like "what is self" . E.g. if "self" means the physical meat, then two-box, if "self" means the algorithm (a higher level concept), then one-box if you assume that the thing in predictor is "self" too.

edit: another thing. Stuff outside robot's senses is naturally uncertain. Upon hearing of the explanation in Newcomb's paradox, one has to update the estimates of what is outside the senses; outside might be that the money are fake, and there's some external logic and wiring and servos that will put real million into a box if you choose to 1-box. If the money are to pay for, I dunno, your child's education, clearly one got to 1-box. I'm pretty sure Causal Deciding General Thud can 1-box just fine, if he needs the money to buy the real weapons for the real army, and suspects that outside his senses there may be the predictor spying. General Thud knows that the best option is to 1-box inside predictor and 2-box outside. The goal is never to two box outside the predictor.

*2 points [-]Let's say that TDT agents can be divided into two categories, TDT-A and TDT-B, based on a single random bit added to their source code in advance. Then TDT-A can take the strategy of always picking the first box in Problem 2, and TDT-B can always pick the second box.

Now, if you're a TDT agent being offered the problem; with the aforementioned strategy, there's a 50% chance that the simulated agent is different than you, netting you $1 million. This also narrows down the advantage of the CDT agent - now they only have a 50% chance of winning the money, which is equal to yours.

Actually, the way the problem is specified, Omega puts the money in box 3.

The argument is that the simulation is either TDT-A in this case, or TDT-B. Either way, the simulated agent will pick a single favourite box (1 or 2) with certainty, so the money is in either Box 2 or Box 1,

Though I can see an interpretation which leads to Box 3. Omega simulates a "new-born" TDT (which is neither -A nor -B) and watches as it differentiates itself to one variant or the other, each with equal probability. So the new-born picks boxes 1 and 2 with equal frequency over multiple simulations, and Box 3 contains the money. Is that what you were thinking?

*0 points [-]Yes. I was thinking that Omega would have access to the agent's source code, and be running the "play against yourself, if you pick a different number than yourself you win" game. Omega is a jerk :D

*2 points [-]If it's your own exact source being simulated, then it's probably impossible to do better than 10%, and the problem isn't interesting anymore.

*0 points [-]That's not too bad, actually. One of my ideas while thrashing about here was that an agent should have a "favourite" number in the set {1, 2} and pick that number with certainty. That way, Omega will definitely put the $1 million in Box 1 or Box 2 and each agent will have 50% chance that their favourite number disagrees with the simulated agent's.

This won't work if Omega describes the source-code of the simulation (or otherwise reveals the simulation's favourite number) - since then any agent with that exact code knows it can't choose deterministically, and its best chance is to pick each box with equal chance, as described in the original analysis.

Someone may already have mentioned this, but doesn't the fact that these scenarios include self-referencing components bring Goedel's Incompleteness Theorem into play somehow? I.e. As soon as we let decision theories become self-referencing, it is impossible for a "best" decision theory to exist at all.

*0 points [-]There was some discussion of much the same point in this comment thread

One important thing to consider is that there may be a sensible way to define "best" that is not susceptible to this type of problem. Most notably, there may be a suitable, solvable, and realistic subclass of problems over which to evaluate performance. Also, even if there is no "best", there can still be better and worse.

Self-reference and the like is necessary for Goedel sentences but not sufficient. It's certainly plausible that this scenario could have a Goedel sentence, but whether the current problem is isomorphic to a Goedel sentence is not obvious, and seems unlikely.

Perhaps referring directly to Goedel was not apt. What Goedel showed was that Hilbert/Russell's efforts were futile. And what Hilbert and Russell were trying to do was create a formal system where actual self-reference was impossible. And the reason he was trying to do that, finally, was that self-reference creates paradoxes which reduce to either incompleteness or inconsistency. And the same is true of these more advanced decision theories. Because they are self-referencing, they create an infinite regress that precludes the existence of a "best" decision theory at all.

So, finding a best decision theory is impossible once self-reference is allowed, because of the nature of self-reference, but not quite because of Goedel's theorems, which are the stronger declaration that any formal system by necessity contains self-referential aspects that make it incomplete or inconsistent.

*3 points [-]Problems 1 and 2 both look - to me - like fancy versions of the Discrimination problem.

edit: I am much less sure of this.That is, Omega changes the world based on whether the agent implements TDT.This bit I am still sure of, but it might be the case that TDT can overcome this anyway.Discrimination problem: Money Omega puts in room if you're TDT = $1,000. Money Omega puts in room if you're not = $1,001,000.Problem 1: Money Omega puts in room if you're TDT = $1,000 or$1,001,000. Edit: made a mistake. The error in this problem may be subtler than I first claimed. Money Omega puts in room if you're not = $1,001,000.Problem 2: $1,000,000 either way. This problem is different but also uninteresting. Due to Omega caring about TDT again, it is just the smallest interesting number paradox for TDT agents only. Other decision theories get a free ride because you're just asking them to reason about an algorithm (easy to show it produces a uniform distribution) and then a maths question (which box has the smallest number on it?).You claim the rewards are

but they're not. They depend on whether the agent uses TDT to choose or not.

I've edited the problem statement to clarify Box A slightly. Basically, Omega will put $1001000 in the room ($1000 for box A and $1 million for Box B) regardless of the algorithm run by the actual deciding agent. The contents of the boxes depend only on what the simulated agent decides.

Agree. You use process X to determine the setup and agents instantiating X are going to be constrained. Any decision theory would be at a disadvantage when singled out like this.

*0 points [-]Sorry, shouldn't it be "$1,000 or $1,001,000"?

Right, but $1,001,000 only in the case where you restrict yourself to picking $1,000,000. I oversimplified and it might not actually be accurate.

*1 point [-]I wonder if there is a mathematician in this forum willing to present the issue in a form of a theorem and a proof for it, in a reasonable mathematical framework. So far all I can see is a bunch of ostensibly plausible informal arguments from different points of view.

Either this problem can be formalized, in which case such a theorem is possible to formulate (whether or not it is possible to prove), or it cannot, in which case it is pointless to argue about it.

Or it's hard to formalize.

*-3 points [-]It's pointless to argue about a decision theory problem until it is formalized, since there is no way to check the validity of any argument.

So, what

oughtone do when interested in a problem (decision theory or otherwise) that one does not yet understand well enough to formalize?I suspect "go do something else until a proper formalization presents itself" is not the best possible answer for all problems, nor is "work silently on formalizing the problem and don't express or defend a position on it until I've succeeded."

*1 point [-]How about "work on formalizing the problem (silently or collaboratively, whatever your style is) and do not defend a position that cannot be successfully defended or refuted"?

Fair enough.

Is there a clear way to distinguish positions worth arguing without formality (e.g., the one you are arguing here) from those that aren't (e.g., the one you are arguing ought not be argued here)?

It's a good question. There ought to be, but I am not sure where the dividing line is.

*0 points [-]You check the arguments using mathematical intuition, and you use them to find better definitions. For example, problems involving continuity or real numbers were fruitfully studied for a very long time before rigorous definitions were found.

*0 points [-]Indeed, you use them to find better definitions, which is the first step in formalizing the problem. If you argue whose answer is right before doing so (as opposed, say, to which answer ought to be right once a proper formalization is found), you succumb to lost purposes.

For example, "TDT ought to always make the best decision in a certain class of problems" is a valid purpose, while "TDT fails on a Newcomb's problem with a TDT-aware predictor" is not a well-defined statement until every part of it is formalized.

[EDIT: I'm baffled by the silent downvote of my pleas for formalization.]

If I had to guess, I'd say that the downvoters interpret those pleas, especially in the context of some of your other comments, as an oblique way of advocating for certain topics of discussion to simply not be mentioned at all.

Admittedly, I interpret them that way myself, so I may just be projecting my beliefs onto others.

Wha...? Thank you for letting me know, though I still have no idea what you might mean, I'd greatly appreciate if you elaborate on that!

I'm not sure I can add much by elaboration.

My general impression of you(1) is that you consider much of the discussion that takes place here, and much of the thinking of the people who do it, to be kind of a silly waste of time, and that you further see your role here in part as the person who points that fact out to those who for whatever reason have failed to notice it.

Within that context, responding to a comment with a request to formalize it is easy to read as a polite way of expressing "what you just said is uselessly vague. If you are capable of saying something useful, do so, otherwise shut up and leave this subject to the grownups."

And since you aren't consistent about wanting

everythingto be expressed as a formalism, I assume this is a function of the topic of discussion, because that's the most charitable assumption I can think of.That said, I reiterate that I have no special knowledge of why you're being downvoted; please don't take me as definitive.

(1) This might be an unfair impression, as I no longer remember what it was that led me to form it.

*3 points [-]Thank you! I always appreciate candid feedback.

It's too easy for this to turn into a general counterargument against anything the person says. It may be of benefit to play the ball and not the man.

Anythingthe person says? In respect tomostthings it would be a total non-sequitur.Yes, I agree. Perhaps I shouldn't have said anything at all, but, well, he asked.

Which issue/problem? fairness?

*1 point [-]The fairness concept:

should be reasonably easy to formalize, because it does not depend on a full [T]DT algorithm. After that, evaluate the performace of [a]DT under a [b]DT-aware Omega Newcomb's problems, as described in the OP, where 'a' and 'b' are particular DTs, e.g. a=b=T.

*-2 points [-]There was this Rocko thing a while back (which is not supposed to be discussed), where if I understood that nonsense correctly, the idea was that the decision theories here would do equivalent to one-boxing on Newcomb with transparent boxes where you could see there is no million, when there's no million. (and where the boxes were made and sealed before you were born). It's not easy to one-box rationally.

Also in practice usually being simulated correctly is awesome for getting scammed (agents tend to face adversaries rather than crazed beneficiaries).

*1 point [-]These questions seem decidedly UNfair to me.

No, they don't depend on the agent's decision-making algorithm; just on another agent's specific decision-making algorithm skewing results against an agent with an identical algorithm and letting all others reap the benefits of an otherwise non-advantageous situation.

So, a couple of things:

While I have not mathematically formulated this, I suspect that absolutely any decision theory can have a similar scenario constructed for it, using another agent / simulation with that specific decision theory as the basis for payoff. Go ahead and prove me wrong by supplying one where that's not the case...

It would be far more interesting to see a TDT-defeating question that doesn't have "TDT" (or taboo versions) as part of its phrasing. In general, questions of how a decision theory fares when agents can scan your algorithm

anddecide to discriminate against that algorithm specifically, are not interesting - because they are losing propositions in any case. When another agent has such profound understanding of how you tick and malice towards that algorithm, you have already lost.Interaction of this simulated TDT and you is so complicated I don't think many of commenters here actually did the math to see how should they expect the simulated TDT agent to react in these situations. I know I didn't. I tried, and failed.

*3 points [-]Maybe I'm missing something, but the formalization looks easy enough to me...

The functions tdt() and you() accept the source code of a function as an argument, and try to maximize its return value. The implementation of tdt() could be any of our formalizations that enumerate proofs successively, which all return 1 if given the source code to tdt_utility. The implementation of you() could be simply "return 2".

Thanks for this. I hadn't seen someone pseudocode this out before. This helps illustrate that interesting problems lie in the scope above (callers to tdt_uility() etc) and below (implementation of tdt() etc).

I wonder if there is a rationality exercise in 'write pseudocode for problem descriptions, explore the callers and implementations'.

1) Not to my knowledge. 2) No, you reasoned TDT's decisions correctly. 3) A TDT agent would not self-modify to CDT, because if it did, its simulation would also self-modify to CDT and then two-box, yielding only $1000 for the real TDT agent. 4) TDT does seem to be a single algorithm, albeit a recursive one in the presense of other TDT agents or simulations. TDT doesn't have to look into its own code, nor does it change its mind upon seeing it, for it decides as if deciding what the code outputs. 5) This is a bit of a tricky one. You could say it's fair if you judge by whether each agent did the best it could have done, rather than getting the most, but a CDT agent could say the same when it two-boxes and reasons it would have gotten $0 if it had one-boxed. I guess in a timeless sense, TDT does the best it could have done in these problems, while CDT doesn't do the best it could have done in newcomb's problem. 6) That's a tough one. If you're asking what omega's intentions are (or would be in the real world), I have no idea. If you're asking who succeeds at the majority of problems in the problem space of anything omega can ask, I strongly believe TDT would outperform CDT on it.

Generalization of Newcomb's Problem: Omega predicts your behavior with accuracy p.

This one could actually be experimentally tested, at least for certain values of p; so for instance we could run undergrads (with $10 and $100 instead of $1,000 and $1,000,000; don't bankrupt the university) and use their behavior from the pilot experiment to predict their behavior in later experiments.

*0 points [-]Why is the discrimination problem "unfair"? It seems like in any situation where decision theories are actually put into practice, that type of reasoning is likely to be popular. In fact I thought the whole point of advanced decision theories was to deal with that sort of self-referencing reasoning. Am I misunderstanding something?

*0 points [-]If you are a TDT agent, you don't know whether you're the simulation or the "outside decision", since they're effectively the same. Or rather, the simulation will have made the same choice that you will make.

If you're not a TDT agent, you gain more information: You're not a TDT agent, and the problem states TDT was simulated.

So the discrimination problem functionally resolves to:

If you are a TDT agent, have some dirt. End of story.

If you are not a TDT agent, I have done some mumbo-jumbo, and now you can either take one box for $1000 or $1m, or both of them for $1001000. Have fun! (the mumbo-jumbo has nothing to do with you anyway!)

Is the trick with problem 1 that what you are really doing, by using a simulation, is having an agent use timeless decision theory in a context where they can't use timeless decision theory? The simulated agent doesn't know about the external agent. Or, you could say, it's impossible for it to be timeless; the directionality of time (simulation first, external agent moves second) is enforced in a way that makes it impossible for the simulated agent to reason across that time barrier. Therefore it's not fair to call what it decides "timeless decision theory".

*0 points [-]Either problem 1 and 2 are hitting an infinite regress issue, or I don't see why an ordinary TDT agent wouldn't 2box, and choose the first box, respectively. There's a difference between the following problems:

youwould do such and such, and acted accordingly.anotheragent, and acted accordingly.this very problem, only if you don't run TDT that's not the same problem, but I promise it's the same nonetheless, and acted accordinglyNow, in problem 1 and 2, are the simulated problem and the actual problem

actually the same? If they are, I see an infinite regress at Omega's side, and therefore not a problem one would ever encounter. If they aren't, then what I actually understand them to be is:Omega presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of

Newcomb'sproblem as presented to an agent running TDT. If the agent two-boxed then I put nothing in Box B, whereas if the agent one-boxed then I put $1 million in Box B. Regardless of how the simulated agent decided, I put $1000 in Box A. Now please choose your box or boxes."Really, You don't have to use something else than TDT to see that the simulated TDT agent one boxed.

Itsproblem isn'tyourproblem. Your precomittment to your problem doesn't affect your precommitment to its problem. Of course, the simulated TDT agent did the right choice by 1 boxing. Butyoushould 2 box.Omega now presents ten boxes, numbered from 1 to 10, and announces the following. "I ran multiple simulation of the following problem, presented to a TDT agent: “You must take exactly one box. I determined which box you are least likely to take, and put $1million in that box. If there is a tie, I put the money in one of them (the one labelled with the lowest number).” I put the money in the box the simulated TDT agent were least likely to choose. If there was a tie, I put the money in one of them (the one labelled with the lowest number). Now choose your box."

Same here. You know that the TDT agent put equal probability on every box, to maximize its gains. Again,

itsproblem isn'tyourproblem. Your precomittment to your problem doesn't affect your precommitment to its problem. Of course, the simulated TDT agent did the right choice by choosing at random. Butyoushould take box 1.This is CDT reasoning, AKA causal reasoning. Or in other words, how do you not use the same reasoning in the original Newcombe problem?

This is indeed a problem - and one I would describe as the general class "dealing with other agents who are fucking with you." It is not one that can be solved and I believe a "correct" decision theory will, in fact, lose (compared to CDT) in this case.

Note that there seems to be some chance that I am confused in a way analogous to the way that people who believe "Two boxing on Newcomb's is rational" are confused. There could be a deep insight I am missing. This seems comparatively unlikely.

*0 points [-]For problem 1, in the language of the blackmail posts, because the tactic omega uses to fill box 2,

depends on TDT-sim's decision, because Omega has already decided, and because Omega didn't make its decision known, a TDT agent presented with this problem is at an epistemic disadvantage relative to Omega: TDT can't react to Omega's actual decision, because it won't know Omega's actual decision until it knows it's own actual decision, at which point TDT can't further react. This epistemic disadvantage doesn't need to be enforced temporally; even if TDT knows Omega's source code, if TDT has limited simulation resources, it might not practically be able to compute Omega's actual decision any way but via Omega's dependence on TDT's decision.

There aren't other ways for an agent to be at an epistemic disadvantage relative to Omega in this problem than by being TDT? Could you construct an agent which was itself disadvantaged relative to TDT?

"Take only the box with $1000."

Which itself is inferior to "Take no box."

Will they? Surely it's clear that it's now possible to take $1,001,00, because the circumstances are slightly different.

In the standard Newcomb problem, where Omega predicts your behaviour, it's not possible to trick it or act other than its expectation. Here, it is.

Is there some basic part of decision theory I'm not accounting for here?

*0 points [-]Yes. If the TDT agent picked the $1,001,00 here, then the simulated agent would have two-boxed as well, meaning only box A would be filled.

Remember, the simulated agent was presented with the same problem, so the decision TDT makes here is the same one the simulated agent makes.

Right, I understand what you mean. I was thinking of in the context of a person being presented with this situation, not an idealized agent running a specific decision theory.

And Omega's simulated agent would presumably hold all the same information as a person would, and be capable of responding the same way.

Cheers for clarifying that for me.

In both your problems, the seeming paradox comes from failure to recognize that the two agents (one that Omega has simulated and one making the decision) are facing entirely different prior information. Then, nothing requires them to make identical decisions. The second agent can simulate itself having prior information I1 (that the simulated agent has been facing), then infer Omega's actions, and arrive at the new prior information I2 that is relevant for the decision. And I2 now is independent of which decision the agent would make

given I2.*2 points [-]Are you sure that they are facing different prior information? If the sim is a good one, then the TDT agent won't be able to tell whether it is the sim or not. However, you are right that one solution could be that there are multiple TDT variants who have different information and so can logically separate their decisions.

I mentioned the problems with that in another response here. The biggest problem is that it seriously undermines the attraction and effectiveness of TDT as a decision theory if different instances of TDT are going to find excuses to separate from each other.

*0 points [-]In Newcomb's Problem, Omega determines ahead of time what decision theory you use. In these problems, it selects an arbitrary decision theory ahead of time. As such, for any agent using this preselected decision theory, these problems are variations of Newcomb's problem. For any agent using a different decision theory, the problem is quite different (and simpler.) Thus, whatever agent has had it's decision theory preselected can only perform as well as in a standard Newcomb's problem, while a luckier agent may perform better. In other words, there are equivalent problems where Omega bases its decision on the results of a CDT or EDT output, in which they actually perform worse than TDT does in these problems.