# Problematic Problems for TDT

34 29 May 2012 03:41PM

A key goal of Less Wrong's "advanced" decision theories (like TDT, UDT and ADT) is that they should out-perform standard decision theories (such as CDT) in contexts where another agent has access to the decider's code, or can otherwise predict the decider's behaviour. In particular, agents who run these theories will one-box on Newcomb's problem, and so generally make more money than agents which two-box. Slightly surprisingly, they may well continue to one-box even if the boxes are transparent, and even if the predictor Omega makes occasional errors (a problem due to Gary Drescher, which Eliezer has described as equivalent to "counterfactual mugging"). More generally, these agents behave like a CDT agent will wish it had pre-committed itself to behaving before being faced with the problem.

However, I've recently thought of a class of Omega problems where TDT (and related theories) appears to under-perform compared to CDT. Importantly, these are problems which are "fair" - at least as fair as the original Newcomb problem - because the reward is a function of the agent's actual choices in the problem (namely which box or boxes get picked) and independent of the method that the agent uses to choose, or of its choices on any other problems. This contrasts with clearly "unfair" problems like the following:

Discrimination: Omega presents the usual two boxes. Box A always contains \$1000. Box B contains nothing if Omega detects that the agent is running TDT; otherwise it contains \$1 million.

So what are some fair "problematic problems"?

Problem 1: Omega (who experience has shown is always truthful) presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT. I won't tell you what the agent decided, but I will tell you that if the agent two-boxed then I put nothing in Box B, whereas if the agent one-boxed then I put \$1 million in Box B. Regardless of how the simulated agent decided, I put \$1000 in Box A. Now please choose your box or boxes."

Analysis: Any agent who is themselves running TDT will reason as in the standard Newcomb problem. They'll prove that their decision is linked to the simulated agent's, so that if they two-box they'll only win \$1000, whereas if they one-box they will win \$1 million. So the agent will choose to one-box and win \$1 million.

However, any CDT agent can just take both boxes and win \$1001000. In fact, any other agent who is not running TDT (e.g. an EDT agent) will be able to re-construct the chain of logic and reason that the simulation one-boxed and so box B contains the \$1 million. So any other agent can safely two-box as well.

Note that we can modify the contents of Box A so that it contains anything up to \$1 million; the CDT agent (or EDT agent) can in principle win up to twice as much as the TDT agent.

Problem 2: Our ever-reliable Omega now presents ten boxes, numbered from 1 to 10, and announces the following. "Exactly one of these boxes contains \$1 million; the others contain nothing. You must take exactly one box to win the money; if you try to take more than one, then you won't be allowed to keep any winnings. Before you entered the room, I ran multiple simulations of this problem as presented to an agent running TDT, and determined the box which the agent was least likely to take. If there were several such boxes tied for equal-lowest probability, then I just selected one of them, the one labelled with the smallest number. I then placed \$1 million in the selected box. Please choose your box."

Analysis: A TDT agent will reason that whatever it does, it cannot have more than 10% chance of winning the \$1 million. In fact, the TDT agent's best reply is to pick each box with equal probability; after Omega calculates this, it will place the \$1 million under box number 1 and the TDT agent has exactly 10% chance of winning it.

But any non-TDT agent (e.g. CDT or EDT) can reason this through as well, and just pick box number 1, so winning \$1 million. By increasing the number of boxes, we can ensure that TDT has arbitrarily low chance of winning, compared to CDT which always wins.

Some questions:

1. Have these or similar problems already been discovered by TDT (or UDT) theorists, and if so, is there a known solution? I had a search on Less Wrong but couldn't find anything obviously like them.

2. Is the analysis correct, or is there some subtle reason why a TDT (or UDT) agent would choose differently from described?

3. If a TDT agent believed (or had reason to believe) that Omega was going to present it with such problems, then wouldn't it want to self-modify to CDT? But this seems paradoxical, since the whole idea of a TDT agent is that it doesn't have to self-modify.

4. Might such problems show that there cannot be a single TDT algorithm (or family of provably-linked TDT algorithms) so that when Omega says it is simulating a TDT agent, it is quite ambiguous what it is doing? (This objection would go away if Omega revealed the source-code of its simulated agent, and the source-code of the choosing agent; each particular version of TDT would then be out-performed on a specific matching problem.)

5. Are these really "fair" problems? Is there some intelligible sense in which they are not fair, but Newcomb's problem is fair? It certainly looks like Omega may be "rewarding irrationality" (i.e. giving greater gains to someone who runs an inferior decision theory), but that's exactly the argument that CDT theorists use about Newcomb.

6. Finally, is it more likely that Omegas - or things like them - will present agents with Newcomb and Prisoner's Dilemma problems (on which TDT succeeds) rather than problematic problems (on which it fails)?

Edit: I tweaked the explanation of Box A's contents in Problem 1, since this was causing some confusion. The idea is that, as in the usual Newcomb problem, Box A always contains \$1000. Note that Box B depends on what the simulated agent chooses; it doesn't depend on Omega predicting what the actual deciding agent chooses (so Omega doesn't put less money in any box just because it sees that the actual decider is running TDT).

Sort By: New
Comment author: [deleted] 28 December 2012 08:38:33PM 0 points [-]

1) Not to my knowledge. 2) No, you reasoned TDT's decisions correctly. 3) A TDT agent would not self-modify to CDT, because if it did, its simulation would also self-modify to CDT and then two-box, yielding only \$1000 for the real TDT agent. 4) TDT does seem to be a single algorithm, albeit a recursive one in the presense of other TDT agents or simulations. TDT doesn't have to look into its own code, nor does it change its mind upon seeing it, for it decides as if deciding what the code outputs. 5) This is a bit of a tricky one. You could say it's fair if you judge by whether each agent did the best it could have done, rather than getting the most, but a CDT agent could say the same when it two-boxes and reasons it would have gotten \$0 if it had one-boxed. I guess in a timeless sense, TDT does the best it could have done in these problems, while CDT doesn't do the best it could have done in newcomb's problem. 6) That's a tough one. If you're asking what omega's intentions are (or would be in the real world), I have no idea. If you're asking who succeeds at the majority of problems in the problem space of anything omega can ask, I strongly believe TDT would outperform CDT on it.

Comment author: 25 December 2012 03:50:19PM *  1 point [-]

Are these really "fair" problems? Is there some intelligible sense in which they are not fair, but Newcomb's problem is fair? It certainly looks like Omega may be "rewarding irrationality" (i.e. giving greater gains to someone who runs an inferior decision theory), but that's exactly the argument that CDT theorists use about Newcomb.

In Newcomb's Problem, Omega determines ahead of time what decision theory you use. In these problems, it selects an arbitrary decision theory ahead of time. As such, for any agent using this preselected decision theory, these problems are variations of Newcomb's problem. For any agent using a different decision theory, the problem is quite different (and simpler.) Thus, whatever agent has had it's decision theory preselected can only perform as well as in a standard Newcomb's problem, while a luckier agent may perform better. In other words, there are equivalent problems where Omega bases its decision on the results of a CDT or EDT output, in which they actually perform worse than TDT does in these problems.

Comment author: 12 June 2012 05:23:40PM *  1 point [-]

These questions seem decidedly UNfair to me.

No, they don't depend on the agent's decision-making algorithm; just on another agent's specific decision-making algorithm skewing results against an agent with an identical algorithm and letting all others reap the benefits of an otherwise non-advantageous situation.

So, a couple of things:

1. While I have not mathematically formulated this, I suspect that absolutely any decision theory can have a similar scenario constructed for it, using another agent / simulation with that specific decision theory as the basis for payoff. Go ahead and prove me wrong by supplying one where that's not the case...

2. It would be far more interesting to see a TDT-defeating question that doesn't have "TDT" (or taboo versions) as part of its phrasing. In general, questions of how a decision theory fares when agents can scan your algorithm and decide to discriminate against that algorithm specifically, are not interesting - because they are losing propositions in any case. When another agent has such profound understanding of how you tick and malice towards that algorithm, you have already lost.

Comment author: 09 June 2012 12:39:32AM 0 points [-]

Generalization of Newcomb's Problem: Omega predicts your behavior with accuracy p.

This one could actually be experimentally tested, at least for certain values of p; so for instance we could run undergrads (with \$10 and \$100 instead of \$1,000 and \$1,000,000; don't bankrupt the university) and use their behavior from the pilot experiment to predict their behavior in later experiments.

Comment author: 05 June 2012 07:08:54PM *  0 points [-]

Why is the discrimination problem "unfair"? It seems like in any situation where decision theories are actually put into practice, that type of reasoning is likely to be popular. In fact I thought the whole point of advanced decision theories was to deal with that sort of self-referencing reasoning. Am I misunderstanding something?

Comment author: 28 December 2012 09:10:04PM *  0 points [-]

If you are a TDT agent, you don't know whether you're the simulation or the "outside decision", since they're effectively the same. Or rather, the simulation will have made the same choice that you will make.

If you're not a TDT agent, you gain more information: You're not a TDT agent, and the problem states TDT was simulated.

So the discrimination problem functionally resolves to:

If you are a TDT agent, have some dirt. End of story.
If you are not a TDT agent, I have done some mumbo-jumbo, and now you can either take one box for \$1000 or \$1m, or both of them for \$1001000. Have fun! (the mumbo-jumbo has nothing to do with you anyway!)

Comment author: 04 June 2012 04:18:07AM 0 points [-]

Is the trick with problem 1 that what you are really doing, by using a simulation, is having an agent use timeless decision theory in a context where they can't use timeless decision theory? The simulated agent doesn't know about the external agent. Or, you could say, it's impossible for it to be timeless; the directionality of time (simulation first, external agent moves second) is enforced in a way that makes it impossible for the simulated agent to reason across that time barrier. Therefore it's not fair to call what it decides "timeless decision theory".

Comment author: 01 June 2012 11:36:34AM 1 point [-]

Intuitively this doesn't feel like a 'fair' problem. A UDT agent would ace the TDT formulation and vice versa. Any TDT agent that found a way of distinguishing between 'themselves' and Omega's TDT agent would also ace the problem. It feels like an acausal version of something like:

"I get agents A and B to choose one or two boxes. I then determine the contents of the boxes based on my best guess of A's choice. Surprisingly, B succeeds much better than A at this."

Still an intriguing problem, though.

Comment author: 31 May 2012 11:28:35AM *  2 points [-]

I think we need a 'non-problematic problems for CDT' thread.

For example, it is not problematic for CDT-based robot controller to have the control values in the action A represent multiple servos in it's world model, as if you wired multiple robot arms to 1 controller in parallel. You may want to do this if you want the robot arms move in unison and pass along the balls in the real world imitation of http://blueballmachine2.ytmnd.com/

It is likewise not problematic if you ran out of wire and decided to make the '1 controller' be physically 2 controllers running identical code from above, or if you ran out of time machines and decided to control yesterday's servo with 1 controller yesterday, and today's servo with same controller in same state today. It's simply low level, irrelevant details.

Mathematical formalization of CDT (such as robot software) will one-box or two-box in newcomb depending to the world model within which CDT decides. If the world model has the 'prediction' as second servo represented by same variable, then it'll one-box.

Philosophical maxims like "act based on consequences of my actions", whenever they one box, or two box, depend in turn solely on philosophical questions like "what is self" . E.g. if "self" means the physical meat, then two-box, if "self" means the algorithm (a higher level concept), then one-box if you assume that the thing in predictor is "self" too.

edit: another thing. Stuff outside robot's senses is naturally uncertain. Upon hearing of the explanation in Newcomb's paradox, one has to update the estimates of what is outside the senses; outside might be that the money are fake, and there's some external logic and wiring and servos that will put real million into a box if you choose to 1-box. If the money are to pay for, I dunno, your child's education, clearly one got to 1-box. I'm pretty sure Causal Deciding General Thud can 1-box just fine, if he needs the money to buy the real weapons for the real army, and suspects that outside his senses there may be the predictor spying. General Thud knows that the best option is to 1-box inside predictor and 2-box outside. The goal is never to two box outside the predictor.

Comment author: 31 May 2012 08:49:30AM *  0 points [-]

Either problem 1 and 2 are hitting an infinite regress issue, or I don't see why an ordinary TDT agent wouldn't 2box, and choose the first box, respectively. There's a difference between the following problems:

• I, Omega, predicted that you would do such and such, and acted accordingly.
• I, Omega, simulated another agent, and acted accordingly.
• I, Omega, simulated this very problem, only if you don't run TDT that's not the same problem, but I promise it's the same nonetheless, and acted accordingly

Now, in problem 1 and 2, are the simulated problem and the actual problem actually the same? If they are, I see an infinite regress at Omega's side, and therefore not a problem one would ever encounter. If they aren't, then what I actually understand them to be is:

1. Omega presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of Newcomb's problem as presented to an agent running TDT. If the agent two-boxed then I put nothing in Box B, whereas if the agent one-boxed then I put \$1 million in Box B. Regardless of how the simulated agent decided, I put \$1000 in Box A. Now please choose your box or boxes."

Really, You don't have to use something else than TDT to see that the simulated TDT agent one boxed. Its problem isn't your problem. Your precomittment to your problem doesn't affect your precommitment to its problem. Of course, the simulated TDT agent did the right choice by 1 boxing. But you should 2 box.

2. Omega now presents ten boxes, numbered from 1 to 10, and announces the following. "I ran multiple simulation of the following problem, presented to a TDT agent: “You must take exactly one box. I determined which box you are least likely to take, and put \$1million in that box. If there is a tie, I put the money in one of them (the one labelled with the lowest number).” I put the money in the box the simulated TDT agent were least likely to choose. If there was a tie, I put the money in one of them (the one labelled with the lowest number). Now choose your box."

Same here. You know that the TDT agent put equal probability on every box, to maximize its gains. Again, its problem isn't your problem. Your precomittment to your problem doesn't affect your precommitment to its problem. Of course, the simulated TDT agent did the right choice by choosing at random. But you should take box 1.

Comment author: 19 June 2012 01:55:33PM 0 points [-]

You don't have to use something else than TDT to see that the simulated TDT agent one boxed. Its problem isn't your problem.

This is CDT reasoning, AKA causal reasoning. Or in other words, how do you not use the same reasoning in the original Newcombe problem?

Comment author: 21 June 2012 10:16:16PM *  -1 points [-]

The reasoning is different because the problem is different.

The simulated agent and yourself were not subjected to the same problem. Therefore you can perfectly precommit to different decisions. TDT does not automatically take the same decisions to problems that merely kinda look the same. They have to actually be the same. There may be specific reasons why TDT would make the same decision, but I doubt it.

Now on to the examples:

### Newcomb's problem

Omega ran a simulation of Newcomb's problem, complete with a TDT agent in it. The simulated TDT agent obviously one boxed, and got the million. If you run TDT yourself, you also know it. Now, Omega tells you of this simulation, and tells you to chose your boxes. This is not Newcomb's problem. If it was, deciding to 2 box would cause box B to be empty!

CDT would crudely assume that 2 boxing gets it \$1000 more than 1 boxing. TDT on the other hand knows the simulated box B (and therefore the real one as well) has the million, regardless of its current decision.

### 10 boxes problem

Again, the simulated problem and the real one aren't the same. If there were, choosing box 1 with probability 1 would cause box 2 to have the million. Because it's not the same problem, even TDT should be allowed to precommit different decision. The point of TDT is to foresee the consequences of its precommitments. It will therefore know that its precommitment in the real problem doesn't have any influence to its precommitment (and therefore the outcome) in the simulated one. This lack of influence allows it to fall back on CDT reasoning.

Makes sense?

Comment author: 25 December 2012 03:53:04PM *  0 points [-]

The simulated agent and yourself were not subjected to the same problem.

Um, yes, they were. That's the whole point.

Comment author: 31 December 2012 07:40:07PM 0 points [-]

I'll need to write a full discussion post about that at some point. There is one crucial difference besides "I'm TDT" and "I'm CDT". It's "The simulated agent uses the same decision theory" and "The simulated agent does not use the same decision theory".

That's not exactly the same problem, and I think that is the whole point.

Comment author: 22 June 2012 12:54:50AM 0 points [-]

The simulated problem and the actual problem don't have to actually be the same - just indistinguishable from the point of view of the agent.

Omega avoids infinite regress because the actual contents of the boxes are irrelevant for the purposes of the simulation, so no sub-simulation is necessary.

Comment author: 22 June 2012 09:15:47AM 0 points [-]

Okay. So, what specific mistake TDT does that would prevent it to distinguish the two problems? What does it lead it to think "If I precommit X in problem 1, I have to precommit X in problem 2 as well".

(If the problems aren't the same, of course Omega can avoid infinite regress. And if there is unbounded regress, we may be able to find a non-infinite solution by looping the regress over itself. But then the problems (simulated an real) are definitely the same.)

Comment author: 22 June 2012 11:51:57AM 0 points [-]

In the simulated problem the simulated agent is presented with the choice but never gets the reward; for all it matters both boxes can be empty. This means that Omega doesn't have to do another simulation to work out what's in the simulated boxes.

The infinite regress is resolvable anyway - since each TDT agent is facing the exact same problem, their decisions must be identical, hence TDT one-boxes and Omega knows this.

Comment author: 22 June 2012 12:20:19PM 0 points [-]

The infinite regress is resolvable anyway - since each TDT agent is facing the exact same problem, their decisions must be identical, hence TDT one-boxes and Omega knows this.

Now there's still the question of the perceived difference between the simulated problem and the real one (I assume here that you should 1 box in the simulation, and 2 box in the real problem). There is a difference, how come TDT does not see it? A Rational Decision Theory would —we humans do. Or if it can see it, how come can't it act on it? RDT could. Do you concede that TDT does and can, or do you still have doubts?

Comment author: 23 June 2012 12:19:36AM 1 point [-]

Due to how the problem is set up, you can't notice the difference until after you've made your decision. The only reason other decision theories know they're not in the simulation is because the problem explicitly states that a TDT agent is simulated, which means it can't be them.

Comment author: 24 June 2012 08:04:56PM *  0 points [-]

The only reason other decision theories know they're not in the simulation is because the problem explicitly states that a TDT agent is simulated, which means it can't be them.

That's false. Here is a modified version of the problem:

Omega presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of Newcomb's problem as presented to you. If your simulated twin 2-boxed then I put nothing in Box B. If your simulated twin 1-boxed, I put \$1 million in Box B. In any case, I put \$1000 in Box A. Now please 1-box or 2-box."

Even if you're not running TDT, the simulated agent is running the same decision algorithm as you are. If that was the reason why TDT couldn't tell the difference, well, now no one can. However you and I can make the difference. The simulated problem is obviously different:

Omega presents the usual two boxes A and B and announces the following. "I am subjecting you to Newcomb's problem. Now please 1-box or 2-box".

Really, the subjective difference between the two problems should be obvious to any remotely rational agent.

(Please let me know if you agree up until that point. Below, I assume you do.)

I'm pretty sure the correct answers for the two problems (my modified version as well as the original one) are 1-box in the simulation, 2-box in the real problem. (Do you still agree?)

So. We both agree that RDT (Rational Decision Theory) 1-boxes in the simulation, and 2-boxes in the real problem. CDT would 2-box in both, and TDT would 1-box in the simulation while in the real problem it would…

• 2-box? I think so.
• 1-box? Supposedly because it can't tell simulation from reality. Or rather, it can't tell the difference between Newcomb's problem and the actual problem. Even though RDT does. (riiight?) So again, I must ask, why not? I need a more specific answer than "due to how the problem is set up". I need you to tell me what specific kind of irrationality TDT is committing here. I need to know its specific blind spot.
Comment author: 24 June 2012 11:04:10PM *  1 point [-]

In your problem, TDT does indeed 2-box, but it's quite a different problem from the original one. Here's the main difference:

I ran a simulation of this problem

vs

I ran a simulation of Newcomb's problem

Comment author: 24 June 2012 08:59:33PM 0 points [-]

Well, in the problem you present here TDT would 2-box, but you've avoided the hard part of the problem from the OP, in which there is no way to tell whether you're in the simulation or not (or at least there is no way for the simulated you to tell), unless you're running some algorithm other than TDT.

Comment author: 28 May 2012 09:08:53AM 5 points [-]

Omega (who experience has shown is always truthful) presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT.

If he's always truthful, then he didn't lie to the simulation either and this means that he did infinitely many simulations before that. So assume he says "Either before you entered the room I ran a simulation of this problem as presented to an agent running TDT, or you are such a simulation yourself and I'm going to present this problem to the real you afterwards", or something similar. If he says different things to you and to your simulation instead, then it's not obvious you'll give the same answer.

Are these really "fair" problems? Is there some intelligible sense in which they are not fair, but Newcomb's problem is fair?

Well, a TDT agent has indexical uncertainty about whether or not they're in the simulation, whereas a CDT or EDT agent doesn't. But I haven't thought this through yet, so it might turn out to be irrelevant.

Comment author: 25 December 2012 03:54:57PM 0 points [-]

I assumed the sims weren't conscious - they were abstract implementations of TDT.

Comment author: 25 December 2012 05:59:29PM 0 points [-]

Well, then there's stuff you know and the sims don't, which you could take in account when deciding and thence decide something different from what they did.

Comment author: 25 December 2012 10:26:34PM 2 points [-]

What stuff? The color of the walls? Memories of your childhood? Unless you have information that alters your decision or you're not a perfect implementer of TDT, in which case you get lumped into the category of "CDT, EDT etc."

Comment author: 25 December 2012 11:47:36PM *  1 point [-]

The fact that you're not a sim, and unlike the sims you'll actually be given the money.

Comment author: 26 December 2012 01:38:30AM *  0 points [-]

Why the hell would Omega program the sim not to value the simulated reward? It's almost certainly just abstract utility anyway.

Comment author: 28 May 2012 09:10:14PM *  0 points [-]

So assume he says "Either before you entered the room I ran a simulation of this problem as presented to an agent running TDT, or you are such a simulation yourself and I'm going to present this problem to the real you afterwards", or something similar.

...

Well, a TDT agent has indexical uncertainty about whether or not they're in the simulation, whereas a CDT or EDT agent doesn't.

Say, you have CDT agent in the world, affecting the world via set of robotic hands, robotic voice, and so on. If you wire up two robot bodies to 1 computer (in parallel so that all movements are done by both bodies), that is just somewhat peculiar robotic manipulator. Handling this doesn't require any changes to CDT.

Likewise when you have two robot bodies controlled by identical mathematical equation, provided that your world model in the CDT utility calculation accounts for all the known manipulators which are controlled by the chosen action, you get correct result.

Likewise, you can have CDT control a multitude of robots, either from one computer, or from multiple computers that independently determine optimal, identical actions (but each computer only act on a robot body assigned to that computer)

The CDT is formally defined using mathematics; the mathematics is already 'timeless', and the fact that the chosen action affects the contents of the boxes is a part of world model not decision theory (and so is the physical time and physical causality a part of world model not the decision theory. Even though the decision theory is called causal, that's some other 'causal').

Comment author: 28 May 2012 06:57:02PM 1 point [-]

This question of "Does Omega lie to sims?" was already discussed earlier in the thread. There were several possible answers from cousin_it and myself, any of which will do.

Comment author: 28 May 2012 03:01:53PM *  0 points [-]

He can't have done literally infinitely many simulations. If that is really required it would be a way out by saying the thought experiment stipulates an impossible situation. I haven't yet considered whether the problem can be changed to give the same result and not require infinitely many simulations.

ETA: no wait, that can't be right, because it would apply to the original Newcomb's problem too. So there must be a way to formalize this correctly. I'll have to look it up but don't have the time right now.

Comment author: 25 December 2012 03:57:35PM 0 points [-]

If that is really required it would be a way out by saying the thought experiment stipulates an impossible situation.

This might be better stated as "incoherent", as opposed to mere impossibility which can be resolved with magic.

Comment author: 28 May 2012 04:03:14PM 1 point [-]

In the original Newcomb's problem it's not specified that Omega performs simulations -- for all we know, he might use magic, closed timelike curves, or quantum magic whereby Box A is in a superposition of states entangled with your mind whereby if you open Box B, A ends up being empty and if you hand B back to Omega, A ends up being full.

Comment author: 28 May 2012 04:26:18PM 0 points [-]

We should take this seriously: a problem that cannot be instantiated in the physical world should not affect our choice of decision theory.

Before I dig myself in deeper, what does existing wisdom say? What is a practical possible way of implementing Newcomb's problem? For instance, simulation is eminently practical as long as Omega knows enough about the agent being simulated. OTOH, macro quantum enganglement of an arbitrary agent's arbitrary physical instantiation with a box prepared by Omega doesn't sound practical to me, but maybe I'm just swayed by increduilty. What do the experts say? (Including you if you're an expert, obviously.)

Comment author: 28 May 2012 04:37:15PM *  -1 points [-]

cannot

0 is not a probability, and even tiny probabilities can give rise to Pascal's mugging.

Unless your utility function is bounded.

Comment author: 28 May 2012 05:02:37PM 0 points [-]

If a problem statement has an internal logical contradiction, there is still a tiny probability that I and everyone else are getting it wrong, due to corrupted hardware or a common misconception about logic or pure chance, and the problem can still be instantiated. But it's so small that I shouldn't give it preferential consideration over other things I might be wrong about, like the nonexistence of a punishing god or that the food I'm served at the restaraunt today is poisoned.

Either of those if true could trump any other (actual) considerations in my actual utility function. The first would make me obey religious strictures to get to heaven. The second threatens death if I eat the food. But I ignore both due to symmetry in the first case (the way to defeat Pascal's wager in general) and to trusting my estimation of the probability of the danger in the second (ordinary expected utility reasoning).

AFAICS both apply to considering an apparently self-contradictory problem statement as really not possible with effective probability zero. I might be misunderstanding things so much that it really is possible, but I might also be misunderstanding things so much that the book I read yesterday about the history of Africa really contained a fascinating new decision theory I must adopt or be doomed by Omega.

All this seems to me to fail due to standard reasoning about Pascal's mugging. What am I missing?

Comment author: 28 May 2012 06:16:50PM 0 points [-]

If a problem statement has an internal logical contradiction

AFAIK Newcomb's dilemma does not logically contradict itself, it just contradict the physical law that causality cannot go backwards in time.

Comment author: 28 May 2012 06:23:57PM *  1 point [-]

AFAIK Newcomb's dilemma does not logically contradict itself, it just contradict the physical law that causality cannot go backwards in time.

It certainly doesn't contradict itself, and I would also assert that it doesn't contradict the physical law that causality cannot go backwards in time. Instead I would say that giving the sane answer to Newcomb's problem requires abanding the assumption that one's decision must be based only on what it affects based on forward in time causal, physical influence.

Comment author: 28 May 2012 07:46:14PM *  0 points [-]

Consider making both boxes transparent to illustrate some related issue.

Comment author: 28 May 2012 04:58:12PM 1 point [-]

0 is not a probability, and even tiny probabilities can give rise to Pascal's mugging.

Even? I'd go as far as to say only. Non-tiny probabilities aren't Pascal's muggings. They are just expected utility calculations. </lighthearted nitpick!>

Comment author: 26 May 2012 02:52:16AM 0 points [-]

Problem 1: Omega (who experience has shown is always truthful) presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT. I won't tell you what the agent decided, but I will tell you that if the agent two-boxed then I put nothing in Box B, whereas if the agent one-boxed then I put \$1 million in Box B. Regardless of how the simulated agent decided, I put \$1000 in Box A. Now please choose your box or boxes."

This is indeed a problem - and one I would describe as the general class "dealing with other agents who are fucking with you." It is not one that can be solved and I believe a "correct" decision theory will, in fact, lose (compared to CDT) in this case.

Note that there seems to be some chance that I am confused in a way analogous to the way that people who believe "Two boxing on Newcomb's is rational" are confused. There could be a deep insight I am missing. This seems comparatively unlikely.

Comment author: 25 May 2012 10:04:12PM *  -2 points [-]

There was this Rocko thing a while back (which is not supposed to be discussed), where if I understood that nonsense correctly, the idea was that the decision theories here would do equivalent to one-boxing on Newcomb with transparent boxes where you could see there is no million, when there's no million. (and where the boxes were made and sealed before you were born). It's not easy to one-box rationally.

Also in practice usually being simulated correctly is awesome for getting scammed (agents tend to face adversaries rather than crazed beneficiaries).

Comment author: [deleted] 25 May 2012 08:14:16PM *  0 points [-]

For problem 1, in the language of the blackmail posts, because the tactic omega uses to fill box 2,

``````TDT-sim.box1,box2=(<F,T> <T,T>) -> Omega.box2=(1M, 0)
``````

depends on TDT-sim's decision, because Omega has already decided, and because Omega didn't make its decision known, a TDT agent presented with this problem is at an epistemic disadvantage relative to Omega: TDT can't react to Omega's actual decision, because it won't know Omega's actual decision until it knows it's own actual decision, at which point TDT can't further react. This epistemic disadvantage doesn't need to be enforced temporally; even if TDT knows Omega's source code, if TDT has limited simulation resources, it might not practically be able to compute Omega's actual decision any way but via Omega's dependence on TDT's decision.

any other agent who is not running TDT ... will be able to re-construct the chain of logic and reason that the simulation one-boxed and so box B contains the \$1 million

There aren't other ways for an agent to be at an epistemic disadvantage relative to Omega in this problem than by being TDT? Could you construct an agent which was itself disadvantaged relative to TDT?

Comment author: 25 May 2012 08:33:32PM 3 points [-]

Could you construct an agent which was itself disadvantaged relative to TDT?

"Take only the box with \$1000."

Which itself is inferior to "Take no box."

Comment author: 24 May 2012 03:25:39PM 9 points [-]

Problem 2 reminds me strongly of playing GOPS.

For those who aren't familiar with it, here's a description of the game. Each player receives a complete suit of standard playing cards, ranked Ace low through King high. Another complete suit, the diamonds, is shuffled (or not, if you want a game of complete information) and put face down on the table; these diamonds have point values Ace=1 through King=13. In each trick, one diamond is flipped face-up. Each player then chooses one card from their own hand to bid for the face-up diamonds, and all bids are revealed simultaneously. Whoever bids highest wins the face-up diamonds, but if there is a tie for the highest bid (even when other players did not tie), then no one wins them and they remain on the table to be won along with the next trick. All bids are discarded after every trick.

Especially when the King comes up early, you can see everyone looking at each other trying to figure out how many levels deep to evaluate "What will the other players do?".

(1) Play my King to be likely to win. (2) Everyone else is likely to do (1) also, which will waste their Kings. So instead play low while they throw away their Kings. (3) If the players are paying attention, they might all realize they should (2), in which case I should play highest low card - the Queen. (4+) The 4th+ levels could repeat (2) and (3) mutatis mutandis until every card has been the optimal choice at some level. In practice, players immediately recognize the futility of that line of thought and instead shift to the question: How far down the chain of reasoning are the other players likely to go? And that tends to depend on knowing the people involved and the social context of the game.

Maybe playing GOPS should be added to the repertoire of difficult decision theory puzzles alongside the prisoner's dilemma, Newcomb's problem, Pascal's mugging, and the rest of that whole intriguing panoply. We've had a Prisoner's Dilemma competition here before - would anyone like to host a GOPS competition?

Comment author: 25 May 2012 08:19:52AM 0 points [-]

I'm going to play this game at LW meetups in future. Hopefully some insights will arise out of it.

I also think I might try to generalise this kind of problem, in the vein of trolley problems being a generalisation of some types of decisions and Parfit's Hitchhiker being a generalisation of precommittment-favouring situations.

Comment author: 24 May 2012 11:05:06AM 0 points [-]

Any agent who is themselves running TDT will reason as in the standard Newcomb problem.

Will they? Surely it's clear that it's now possible to take \$1,001,00, because the circumstances are slightly different.

In the standard Newcomb problem, where Omega predicts your behaviour, it's not possible to trick it or act other than its expectation. Here, it is.

Is there some basic part of decision theory I'm not accounting for here?

Comment author: 24 May 2012 12:45:28PM *  1 point [-]

Yes. If the TDT agent picked the \$1,001,00 here, then the simulated agent would have two-boxed as well, meaning only box A would be filled.

Remember, the simulated agent was presented with the same problem, so the decision TDT makes here is the same one the simulated agent makes.

Comment author: 24 May 2012 01:08:23PM 1 point [-]

Right, I understand what you mean. I was thinking of in the context of a person being presented with this situation, not an idealized agent running a specific decision theory.

And Omega's simulated agent would presumably hold all the same information as a person would, and be capable of responding the same way.

Cheers for clarifying that for me.

Comment author: 24 May 2012 07:51:17AM *  2 points [-]

Let's say that TDT agents can be divided into two categories, TDT-A and TDT-B, based on a single random bit added to their source code in advance. Then TDT-A can take the strategy of always picking the first box in Problem 2, and TDT-B can always pick the second box.

Now, if you're a TDT agent being offered the problem; with the aforementioned strategy, there's a 50% chance that the simulated agent is different than you, netting you \$1 million. This also narrows down the advantage of the CDT agent - now they only have a 50% chance of winning the money, which is equal to yours.

Comment author: 24 May 2012 07:31:23PM 3 points [-]

Actually, the way the problem is specified, Omega puts the money in box 3.

Comment author: 24 May 2012 07:58:11PM 0 points [-]

The argument is that the simulation is either TDT-A in this case, or TDT-B. Either way, the simulated agent will pick a single favourite box (1 or 2) with certainty, so the money is in either Box 2 or Box 1,

Though I can see an interpretation which leads to Box 3. Omega simulates a "new-born" TDT (which is neither -A nor -B) and watches as it differentiates itself to one variant or the other, each with equal probability. So the new-born picks boxes 1 and 2 with equal frequency over multiple simulations, and Box 3 contains the money. Is that what you were thinking?

Comment author: 24 May 2012 08:00:53PM *  0 points [-]

Is that what you were thinking?

Yes. I was thinking that Omega would have access to the agent's source code, and be running the "play against yourself, if you pick a different number than yourself you win" game. Omega is a jerk :D

Comment author: 24 May 2012 08:17:25PM *  2 points [-]

If it's your own exact source being simulated, then it's probably impossible to do better than 10%, and the problem isn't interesting anymore.

Comment author: 24 May 2012 12:23:59PM *  0 points [-]

That's not too bad, actually. One of my ideas while thrashing about here was that an agent should have a "favourite" number in the set {1, 2} and pick that number with certainty. That way, Omega will definitely put the \$1 million in Box 1 or Box 2 and each agent will have 50% chance that their favourite number disagrees with the simulated agent's.

This won't work if Omega describes the source-code of the simulation (or otherwise reveals the simulation's favourite number) - since then any agent with that exact code knows it can't choose deterministically, and its best chance is to pick each box with equal chance, as described in the original analysis.

Comment author: 24 May 2012 12:17:00AM 4 points [-]

Can someone answer the following: Say someone implemented an AGI using CDT. What exactly would go wrong that a better decision theory would fix?

Comment author: 28 May 2012 09:14:09AM 1 point [-]

I think TDT reduces to CDT if there's no other agent with similar or greater intelligence than you around. (You also mustn't have any dynamical inconsistency such as akrasia, otherwise your future and past selves count as ‘other’ as well.) So I don't think it'd make much of a difference for a singleton -- but I'd rather use an RDT just in case.

Comment author: 28 May 2012 02:27:21PM 1 point [-]

I think TDT reduces to CDT if there's no other agent with similar or greater intelligence than you around.

It isn't the absolute level of intelligence that is required, but rather that the other agent is capable of making a specific kind of reasoning. Even this can be relaxed to things that can only dubiously be said to qualify as being classed "agent". The requirement is that some aspect of the environment has (utility-relevant) behavior that is entangled with the output of the decision to be made in a way that is other than a forward in time causal influence. This almost always implies that some agent is involved but that need not necessarily be the case.

Caveat: Maybe TDT is dumber than I remember and artificially limits itself in a way that is relevant here. I'm more comfortable making assertions about what a correct decision theory would do than about what some specific attempt to specify a decision theory would do.

but I'd rather use an RDT just in case.

You make me happy! RDT!

Comment author: 24 May 2012 07:38:35PM 5 points [-]

It will defect on all prisoners dilemmas, even if they're iterated. So, for example, if we'd left it in charge of our nuclear arsenal during the cold war, it would have launched missiles as fast as possible.

But I think the main motivation was that, when given the option to self-modify, a CDT agent will self-modify as a method of precommittment - CDT isn't "reflectively consistent." And so if you want to predict an AI's behavior, if you predict based on CDT with no self-modification you'll get it wrong, since it doesn't stay CDT. Instead, you should try to find out what the AI wants to self-modify to, and predict based on that.

Comment author: 29 May 2012 09:37:18AM *  1 point [-]

It will defect on all prisoners dilemmas, even if they're iterated. So, for example, if we'd left it in charge of our nuclear arsenal during the cold war, it would have launched missiles as fast as possible.

I don't think MAD is a prisoner dilemma: in the prisoner dilemma, if I know you're going to cooperate no matter what, I'm better off defecting, and if I know you're going to defect no matter what, I'm better off defecting. This doesn't seem to be the case here: bombing you doesn't make me better off all things being equal, it just makes you worse off. If anything, it's a game of Chicken where bombing the opponent corresponds to going straight and not bombing them corresponds to swerving. And CDTists don't always go straight in Chicken, do they?

Comment author: 29 May 2012 11:19:15AM 0 points [-]

Hm, I disagree - if nuking the Great Enemy never made you any better off, why was anyone ever afraid of anyone getting nuked in the first place? It might not grow your crops for you or buy you a TV, but gains in security and world power are probably enough incentive to at least make people worry.

Comment author: 29 May 2012 11:24:08AM *  1 point [-]

Still better modelled by Chicken (where the utility of winning is assumed to be much smaller than the negative of the utility of dying, but still non-zero) than by PD.

Comment author: 30 May 2012 05:00:37AM 0 points [-]

I don't understand what you mean by "modeled better by chicken" here.

Comment author: 30 May 2012 05:48:16AM *  1 point [-]

I expect army1987's talking about Chicken, the game of machismo in which participants rush headlong at each other in cars or other fast-moving dangerous objects and whoever swerves first loses. The payoff matrix doesn't resemble the Prisoner's Dilemma all that much: there's more than one Nash equilibrium, and by far the worst outcome from either player's perspective occurs when both players play the move analogous to defection (i.e. don't swerve). It's probably most interesting as a vehicle for examining precommitment tactics.

Comment author: 30 May 2012 10:22:06AM 0 points [-]

I was. I should have linked to it, and I have now.

Comment author: 25 May 2012 11:21:12AM 3 points [-]

A more correct analysis is that CDT defects against itself in iterated Prisoner's Dilemma, provided there is any finite bound to the number of iterations. So two CDTs in charge of nuclear weapons would reason "Hmm, the sun's going to go Red Giant at some point, and even if we escape that, there's still that Heat Death to worry about. Looks like an upper bound to me". And then they'd immediately nuke each other.

A CDT playing against a "RevengeBot" - if you nuke it, it nukes back with an all out strike - would never fire its weapons. But then the RevengeBot could just take out one city at a time, without fear of retaliation.

Since CDT was the "gold standard" of rationality developed during the time of the Cold War, I am somewhat puzzled why we're still here.

Comment author: 26 May 2012 02:31:27AM 1 point [-]

So two CDTs in charge of nuclear weapons would reason "Hmm, the sun's going to go Red Giant at some point, and even if we escape that, there's still that Heat Death to worry about. Looks like an upper bound to me". And then they'd immediately nuke each other.

This assumes that the mutual possession of nuclear weapons constitutes a prisoners dilemma. There isn't necessarily a positive payoff to nuking folks. (You know, unless they are really jerks!)

Comment author: 26 May 2012 06:57:12AM 1 point [-]

Well nuking the other side eliminates the chance that they'll ever nuke you (or will attack with conventional weapons), so there is arguably a slight positive for nuking first as opposed to keeping the peace.

There were some very serious thinkers arguing for a first strike against the Soviet Union immediately after WW2, including (on some readings) Bertrand Russell, who later became a leader of CND. And a pure CDT (with selfish utility) would have done so. I don't see how Schelling theory could have modified that... just push the other guy over the cliff before the ankle-chains get fastened.

Probably the reason it didn't happen was the rather obvious "we don't want to go down in history as even worse than the Nazis" - also there was complacency about how far behind the Soviets actually were. If it had been known that they would explode an A-bomb as little as 4 years after the war, then the calculation would have been different. (Last ditch talks to ban nuclear weapons completely and verifiably - by thorough spying on each other - or bombs away. More likely bombs away I think.)

Comment author: 25 May 2012 11:30:47AM 2 points [-]

Well, it's good that you're puzzled, because it wasn't - see Schelling's "The Strategy of Conflict."

Comment author: 25 May 2012 11:52:25AM 0 points [-]

I get the point that a CDT would pre-commit to retaliation if it had time (i.e. self-modify into a RevengeBot).

The more interesting question is why it bothers to do that re-wiring when it is expecting the nukes from the other side any second now...

Comment author: 25 May 2012 07:17:14AM 0 points [-]

even if they're iterated.

That doesn't seem right. Defecting causes the opponent to defect next time. It's a bad idea with any decision theory.

Instead, you should try to find out what the AI wants to self-modify to, and predict based on that.

It won't self-modify to TDT. It will self-modify to something similar, but using its beliefs at the time of modification as the priors. For example, it will use the doomsday argument immediately to find out how long the world is likely to last, and it will use that information from then on, rather than redoing it as its future self (getting a different answer).

Comment author: 25 May 2012 08:54:54AM 0 points [-]

That doesn't seem right. Defecting causes the opponent to defect next time. It's a bad idea with any decision theory.

Fair enough. I guess I had some special case stuff in mind - there are certainly ways to get a CDT agent to cooperate on prisoner's dilemma ish problems.

Comment author: 25 May 2012 08:21:03AM 0 points [-]

That doesn't seem right. Defecting causes the opponent to defect next time. It's a bad idea with any decision theory.

Reason backwards from the inevitable end of the iteration. Defecting makes sense there, so defecting one turn earlier makes sense, so one turn earlier...

Comment author: 25 May 2012 07:17:56PM 0 points [-]

That depends on if it's known what the last iteration will be.

Also, I think any deviation from CDT in common knowledge (such as if you're not sure that they're sure that you're sure that they're a perfect CDT) would result in defecting a finite, and small, number of iterations from the end.

Comment author: 24 May 2012 07:41:55PM 0 points [-]

Ah, that second paragraph makes perfect sense. Thanks.

Comment author: 23 May 2012 09:32:40PM 0 points [-]

Interaction of this simulated TDT and you is so complicated I don't think many of commenters here actually did the math to see how should they expect the simulated TDT agent to react in these situations. I know I didn't. I tried, and failed.

Comment author: 24 May 2012 09:35:39AM *  3 points [-]

Maybe I'm missing something, but the formalization looks easy enough to me...

``````def tdt_utility():
if tdt(tdt_utility) == 1:
box1 = 1000
box2 = 1000000
else:
box1 = 1000
box2 = 0
if tdt(tdt_utility) == 1:
return box2
else:
return box1+box2
def your_utility():
if tdt(tdt_utility) == 1:
box1 = 1000
box2 = 1000000
else:
box1 = 1000
box2 = 0
if you(your_utility) == 1:
return box2
else:
return box1+box2
``````

The functions tdt() and you() accept the source code of a function as an argument, and try to maximize its return value. The implementation of tdt() could be any of our formalizations that enumerate proofs successively, which all return 1 if given the source code to tdt_utility. The implementation of you() could be simply "return 2".

Comment author: 04 June 2012 02:11:56AM 0 points [-]

Thanks for this. I hadn't seen someone pseudocode this out before. This helps illustrate that interesting problems lie in the scope above (callers to tdt_uility() etc) and below (implementation of tdt() etc).

I wonder if there is a rationality exercise in 'write pseudocode for problem descriptions, explore the callers and implementations'.

Comment author: 23 May 2012 08:39:08PM 5 points [-]

Someone may already have mentioned this, but doesn't the fact that these scenarios include self-referencing components bring Goedel's Incompleteness Theorem into play somehow? I.e. As soon as we let decision theories become self-referencing, it is impossible for a "best" decision theory to exist at all.

Comment author: 21 June 2012 09:54:39AM *  0 points [-]

There was some discussion of much the same point in this comment thread

One important thing to consider is that there may be a sensible way to define "best" that is not susceptible to this type of problem. Most notably, there may be a suitable, solvable, and realistic subclass of problems over which to evaluate performance. Also, even if there is no "best", there can still be better and worse.

Comment author: 25 May 2012 08:30:56AM 0 points [-]

doesn't the fact that these scenarios include self-referencing components bring Goedel's Incompleteness Theorem into play somehow?

Self-reference and the like is necessary for Goedel sentences but not sufficient. It's certainly plausible that this scenario could have a Goedel sentence, but whether the current problem is isomorphic to a Goedel sentence is not obvious, and seems unlikely.

Comment author: 20 June 2012 07:14:01PM 0 points [-]

Perhaps referring directly to Goedel was not apt. What Goedel showed was that Hilbert/Russell's efforts were futile. And what Hilbert and Russell were trying to do was create a formal system where actual self-reference was impossible. And the reason he was trying to do that, finally, was that self-reference creates paradoxes which reduce to either incompleteness or inconsistency. And the same is true of these more advanced decision theories. Because they are self-referencing, they create an infinite regress that precludes the existence of a "best" decision theory at all.

So, finding a best decision theory is impossible once self-reference is allowed, because of the nature of self-reference, but not quite because of Goedel's theorems, which are the stronger declaration that any formal system by necessity contains self-referential aspects that make it incomplete or inconsistent.

Comment author: 23 May 2012 08:37:36PM *  1 point [-]

I wonder if there is a mathematician in this forum willing to present the issue in a form of a theorem and a proof for it, in a reasonable mathematical framework. So far all I can see is a bunch of ostensibly plausible informal arguments from different points of view.

Either this problem can be formalized, in which case such a theorem is possible to formulate (whether or not it is possible to prove), or it cannot, in which case it is pointless to argue about it.

Comment author: 25 May 2012 12:52:53AM 1 point [-]

Which issue/problem? fairness?

Comment author: 25 May 2012 05:35:09PM *  1 point [-]

The fairness concept:

the reward is a function of the agent's actual choices in the problem (namely which box or boxes get picked) and independent of the method that the agent uses to choose, or of its choices on any other problems.

should be reasonably easy to formalize, because it does not depend on a full [T]DT algorithm. After that, evaluate the performace of [a]DT under a [b]DT-aware Omega Newcomb's problems, as described in the OP, where 'a' and 'b' are particular DTs, e.g. a=b=T.

Comment author: 23 May 2012 09:58:45PM 2 points [-]

Either this problem can be formalized, in which case such a theorem is possible to formulate (whether or not it is possible to prove), or it cannot, in which case it is pointless to argue about it.

Or it's hard to formalize.

Comment author: 23 May 2012 10:33:46PM *  -3 points [-]

Or it's hard to formalize.

It's pointless to argue about a decision theory problem until it is formalized, since there is no way to check the validity of any argument.

Comment author: 23 May 2012 11:04:28PM 0 points [-]

So, what ought one do when interested in a problem (decision theory or otherwise) that one does not yet understand well enough to formalize?

I suspect "go do something else until a proper formalization presents itself" is not the best possible answer for all problems, nor is "work silently on formalizing the problem and don't express or defend a position on it until I've succeeded."

Comment author: 23 May 2012 11:31:15PM *  1 point [-]

How about "work on formalizing the problem (silently or collaboratively, whatever your style is) and do not defend a position that cannot be successfully defended or refuted"?

Comment author: 24 May 2012 12:28:02AM 2 points [-]

Fair enough.
Is there a clear way to distinguish positions worth arguing without formality (e.g., the one you are arguing here) from those that aren't (e.g., the one you are arguing ought not be argued here)?

Comment author: 24 May 2012 01:21:51AM 2 points [-]

It's a good question. There ought to be, but I am not sure where the dividing line is.

Comment author: 23 May 2012 10:40:50PM *  0 points [-]

You check the arguments using mathematical intuition, and you use them to find better definitions. For example, problems involving continuity or real numbers were fruitfully studied for a very long time before rigorous definitions were found.

Comment author: 23 May 2012 11:26:16PM *  0 points [-]

You check them using mathematical intuition, and you use them to find better definitions.

Indeed, you use them to find better definitions, which is the first step in formalizing the problem. If you argue whose answer is right before doing so (as opposed, say, to which answer ought to be right once a proper formalization is found), you succumb to lost purposes.

For example, "TDT ought to always make the best decision in a certain class of problems" is a valid purpose, while "TDT fails on a Newcomb's problem with a TDT-aware predictor" is not a well-defined statement until every part of it is formalized.

[EDIT: I'm baffled by the silent downvote of my pleas for formalization.]

Comment author: 24 May 2012 12:53:32AM 1 point [-]

[EDIT: I'm baffled by the silent downvote of my pleas for formalization.]

If I had to guess, I'd say that the downvoters interpret those pleas, especially in the context of some of your other comments, as an oblique way of advocating for certain topics of discussion to simply not be mentioned at all.

Admittedly, I interpret them that way myself, so I may just be projecting my beliefs onto others.

Comment author: 24 May 2012 01:24:28AM 2 points [-]

as an oblique way of advocating for certain topics of discussion to simply not be mentioned at all

Wha...? Thank you for letting me know, though I still have no idea what you might mean, I'd greatly appreciate if you elaborate on that!

Comment author: 24 May 2012 04:54:43AM 7 points [-]

I'm not sure I can add much by elaboration.

My general impression of you(1) is that you consider much of the discussion that takes place here, and much of the thinking of the people who do it, to be kind of a silly waste of time, and that you further see your role here in part as the person who points that fact out to those who for whatever reason have failed to notice it.

Within that context, responding to a comment with a request to formalize it is easy to read as a polite way of expressing "what you just said is uselessly vague. If you are capable of saying something useful, do so, otherwise shut up and leave this subject to the grownups."

And since you aren't consistent about wanting everything to be expressed as a formalism, I assume this is a function of the topic of discussion, because that's the most charitable assumption I can think of.

That said, I reiterate that I have no special knowledge of why you're being downvoted; please don't take me as definitive.

(1) This might be an unfair impression, as I no longer remember what it was that led me to form it.

Comment author: 24 May 2012 02:38:36PM *  3 points [-]

Thank you! I always appreciate candid feedback.

Comment author: 24 May 2012 12:27:58PM 1 point [-]

My general impression of you(1) is that you consider much of the discussion that takes place here, and much of the thinking of the people who do it, to be kind of a silly waste of time, and that you further see your role here in part as the person who points that fact out to those who for whatever reason have failed to notice it.

It's too easy for this to turn into a general counterargument against anything the person says. It may be of benefit to play the ball and not the man.

Comment author: 30 May 2012 03:49:34PM 1 point [-]

Anything the person says? In respect to most things it would be a total non-sequitur.

Comment author: 24 May 2012 12:38:19PM 0 points [-]

Yes, I agree. Perhaps I shouldn't have said anything at all, but, well, he asked.

Comment author: 23 May 2012 07:16:17PM 15 points [-]

My sense is that question 6 is a better question to ask than 5. That is, what's important isn't drawing some theoretical distinction between fair and unfair problems, but finding out what problems we and/or our agents will actually face. To the extent that we are ignorant of this now but may know more in the future when we are smarter and more powerful, it argues for not fixing a formal decision theory to determine our future decisions, but instead making sure that we and/or our agents can continue to reason about decision theory the same way we currently can (i.e., via philosophy).

Comment author: 23 May 2012 07:01:14PM 9 points [-]

The problems look like a kind of an anti-Prisoner's Dilemma. An agent plays against an opponent, and gets a reward iff they played differently. Then any agent playing against itself is screwed.

Comment author: 23 May 2012 05:51:57PM 0 points [-]

In both your problems, the seeming paradox comes from failure to recognize that the two agents (one that Omega has simulated and one making the decision) are facing entirely different prior information. Then, nothing requires them to make identical decisions. The second agent can simulate itself having prior information I1 (that the simulated agent has been facing), then infer Omega's actions, and arrive at the new prior information I2 that is relevant for the decision. And I2 now is independent of which decision the agent would make given I2.

Comment author: 23 May 2012 07:06:52PM *  2 points [-]

Are you sure that they are facing different prior information? If the sim is a good one, then the TDT agent won't be able to tell whether it is the sim or not. However, you are right that one solution could be that there are multiple TDT variants who have different information and so can logically separate their decisions.

I mentioned the problems with that in another response here. The biggest problem is that it seriously undermines the attraction and effectiveness of TDT as a decision theory if different instances of TDT are going to find excuses to separate from each other.

Comment author: 23 May 2012 05:42:51PM -1 points [-]

Omega (who experience has shown is always truthful) presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT.

There seems to be a contradiction here. If Omega siad this to me I would either have to believe omega just presented evidence of being untruthful some of the time.

If Omega simulated the problem at hand then in said simulation Omega must have siad: "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT." In the first simulation the statement is a lie.

Problem 2 has a similar problem.

It is not obvious that the problem can be reformulated to keep Omega constantly truthfully and still have CDT or EDT come out ahead of TDT.

Comment author: 23 May 2012 06:57:44PM *  3 points [-]

Your difficulty seems to be with the parenthesis "(who experience has shown is always truthful)". The relevant experience here is going to be derived from real-world subjects who have been in Omega problems, exactly as is assumed for the standard Newcomb problem. It's not obvious that Omega always tells the truth to its simulations; no-one in the outside world has experience of that.

However you can construe the problem so that Omega doesn't have to lie, even to sims. Omega could always prefix its description of the problem with a little disclaimer "You may be one of my simulations. But if not, then...".

Or Omega could simulate a TDT agent making decisions as if it had just been given the problem description verbally by Omega, without Omega actually doing so. (Whether that's possible or not depends a bit on the simulation).

Comment author: 23 May 2012 05:57:50PM *  1 point [-]

Omega could truthfully say "the contents of the boxes are exactly as if I'd presented this problem to an agent running TDT".

Comment author: 23 May 2012 06:41:41PM 0 points [-]

I do not know if Omega can say that truthfully because I do not know weather the self referential equation representing the problem has a solution.

The problems set out by the OP assumes there is a solution and a particular answer but with out writing out the equation and plugging in his solution to show the solution actually works.

Comment author: 23 May 2012 07:05:49PM *  0 points [-]

There is a solution because Omega can get an answer by simulating TDT, or am I missing something?

Comment author: 24 May 2012 08:12:07AM 0 points [-]

It may or may not be proven that TDT settles on answers to questions involving TDT. If TDT doesn't get an answer, then TDT can't get an answer.

Presumably it is true that TDT settles but if it isn't proven, it may not be true; or it could be that the proof (i.e. a formalization of TDT) will provide insight that is currently lacking (such as cutting off after a certain level of resource use; can Omega emulate how many resources the current TDT agent will use? Can the TDT agent commit to using a random number of resources? Do true random-number generators exist? These problems might all be inextricable. Or they might not. I, for one, don't know.)

Comment author: 24 May 2012 09:17:30AM *  1 point [-]

It may or may not be proven that TDT settles on answers to questions involving TDT.

We have several formalizations of UDT that would solve this problem correctly.

Comment author: 24 May 2012 05:57:48PM 0 points [-]

Having several formalizations is 90% of a proof, not 100% of a proof. Turn the formalization into a computer program AND either prove that it halts or run this simulation on it in finite time.

I believe that it's true that TDT will get an answer and hence Omega will get an answer, but WHY this is true relies on facts about TDT that I don't know (specifically facts about its implementation; maybe facts about differential topology that game-theoretic equilibrium results rely on.)

Comment author: 24 May 2012 06:27:22PM *  0 points [-]

The linked posts have proofs that the programs halt and return the correct answer. Do you understand the proofs, or could you point out the areas that need more work? Many commenters seemed to understand them...

Comment author: 25 May 2012 04:14:27AM 1 point [-]

I do not understand the proofs, primarily because I have not put time in to trying to understand them.

I may have become somewhat defensive in these posts (or withdrawn I guess?) but looking back my original point was really to point out that, naively, asking whether the problem is well-defined is a reasonable question.

The questions in the OP set off alarm bells for me of "this type of question might be a badly-defined type of question" so asking whether these decisions are in the "halting domain" (is there an actual term for that?) of TDT seems like a reasonable question to ask before putting too much thought into other issues.

I believe the answer to be that yes these questions are in the "halting domain" of TDT, but I also believe that understanding what that is and why these questions are legitimate and the proofs that TDT halts will be central to any resolution of these problems.

What I'm really trying to say here is that it makes sense to ask these questions, but I don't understand why, so I think Davorak's question was reasonable, and your answer didn't feel complete to me. Looking back, I don't think I've contributed much to this conversation. Sorry!

Comment author: 23 May 2012 02:14:20PM *  28 points [-]

You can construct a "counterexample" to any decision theory by writing a scenario in which it (or the decision theory you want to have win) is named explicitly. For example, consider Alphabetic Decision Theory, which writes a description of each of the options, then chooses whichever is first alphabetically. ADT is bad, but not so bad that you can't make it win: you could postulate an Omega which checks to see whether you're ADT, gives you \$1000 if you are, and tortures you for a year if you aren't.

That's what's happening in Problem 1, except that it's a little bit hidden. There, you have an Omega which says: if you are TDT, I will make the content of these boxes depend on your choice in such a way that you can't have both; if you aren't TDT, I filled both boxes.

You can see that something funny has hapened by postulating TDT-prime, which is identical to TDT except that Omega doesn't recognize it as a duplicate (eg, it differs in some way that should be irrelevant). TDT-prime would two-box, and win.

Comment author: 11 June 2012 10:09:55PM 2 points [-]

Indeed. These are all scenarios of the form "Omega looks at the source code for your decision theory, and intentionally creates a scenario that breaks it." Omega could do this with any possible decision theory (or at last, anything that could be implemented with finite resources), so what exactly are we supposed to learn by contemplating specific examples?

It seems to me that the valuable Omega thought experiments are the ones where Omega's omnipotence is simply used to force the player to stick to the rules of the given scenario. When you start postulating that an impossible, acausal superintelligence is actively working agaisnt you it's time to hang up your hat and go home, because no strategy you could possibly come up with is going to do you any good.

Comment author: 24 December 2012 09:57:12PM 1 point [-]

The trouble is when another agent wins in this situation and in the situations you usually encounter. For example, an anti-traditional-rationalist, that always makes the opposite choice to a traditional rationalist, will one-box; it just fails spectacularly when asked to choose between different amounts of cake.

Comment author: 23 May 2012 03:09:44PM *  20 points [-]

Right, but this is exactly the insight of this post put another way. The possibility of an Omega that rewards eg ADT is discussed in Eliezer's TDT paper. He sets out an idea of a "fair" test, which evaluates only what you do and what you are predicted to do, not what you are. What's interesting about this is that this is a "fair" test by that definition, yet it acts like an unfair test.

Because it's a fair test, it doesn't matter whether Omega thinks TDT and TDT-prime are the same - what matters is whether TDT-prime thinks so.

Comment author: 25 June 2012 07:16:55AM *  3 points [-]

Because it's a fair test

No, not even by Eliezer's standard, because TDT is not given the same problem than other decision theories.

As stated in comments below, everyone but TDT have the information "I'm not in the simulation" (or more precisely, in one of the simulations of the infinite regress that is implied by Omega's formulation). The reason TDT does not have this extra piece of information comes from the fact that it is TDT, not from any decision it may make.

Comment author: 25 June 2012 03:40:53PM *  0 points [-]

This variation of the problem was invented in the follow-up post (I think it was called "Sneaky strategies for TDT" or something like that:

Omega tells you that earlier he flipped a coin. If the coin came down heads, it simulated a CDT agent facing this problem. If the coin came down tails, it simulated a TDT agent facing this problem. In either case, if the simulated agent one-boxed, there is \$1000000 in Box-B; if it two-boxed Box-B is empty. In this case TDT still one-boxes (50% chance of \$1000000 dominates a 100% chance of \$1000), and CDT still two-boxes (because that's what CDT does). In this case, even though both agents have an equal chance of being simulated, CDT out-performs TDT (average payoffs of 500500 vs. 500000) - CDT takes advantage of TDT's prudence and TDT suffers for CDT's lack of it. Notice also that TDT cannot do better by behaving like CDT (both would get payoffs of 1000). This shows that the class of problems we're concerned with is not so much "fair" vs. "unfair", but more like "those problem on which the best I can do is not necessarily the best anyone can do". We can call it "fairness" if we want, but it's not like Omega is discriminating against TDT in this case.

Comment author: 25 June 2012 04:04:04PM *  3 points [-]

This is not a zero-sum game. CDT does not outperform TDT here. It just makes a stupid mistake, and happens to pay it less dearly than TDT

Let's say Omega submit the same problem to 2 arbitrary decision theories. Each will either 1-box or 2-box. Here is the average payoff matrix:

• Both a and b 1-box -> They both get the million
• Both a and b 2-box -> They both get 1000 only.
• One 1-boxes, the other 2-boxes -> the 1-boxer gets half a million, the other gets 5000 more.

Clearly, 1 boxing still dominates 2-boxing. Whatever the other does, you personally get about half a million more by 1-boxing. TDT may have less utility than CDT for 1-boxing, but CDT is still stupid here, while TDT is not.

Comment author: 25 June 2012 09:14:08AM 1 point [-]

Right, and this is an unfairness that Eliezer's definition fails to capture.

Comment author: 25 June 2012 11:43:57AM 0 points [-]

At this point, I need the text of that definition.

Comment author: 25 June 2012 12:04:12PM 0 points [-]

The definition is in Eliezer's TDT paper although a quick grep for "fair" didn't immediately find the definition.

Comment author: 23 May 2012 10:06:14PM 4 points [-]

He sets out an idea of a "fair" test, which evaluates only what you do and what you are predicted to do, not what you are.

Two questions: First, how does is this distinction justified? What a decision theory is is a strategy for responding to decision tasks and simulating agents performing the right decision tasks tells you what kind of decision theory they're using. Why does it matter if it's done implicitly (as in Newcomb's discrimination against CDT) or explicitly. And second why should we care about it? Why is it important for a decision theory to pass fair tests but not unfair tests?

Comment author: 24 May 2012 10:47:29AM 7 points [-]

Why is it important for a decision theory to pass fair tests but not unfair tests?

Well, on unfair tests a decision theory still needs to do as well as possible. If we had a version of the original Newcomb's problem, with the one difference that a CDT agent gets \$1billion just for showing up, it's still incumbent upon a TDT agent to walk away with \$1000000 rather than \$1000. The "unfair" class of problems is that class where "winning as much as possible" is distinct from "winning the most out of all possible agents".

Comment author: 24 May 2012 06:50:28AM 3 points [-]

Real-world unfair tests could matter, though it's not clear if there are any. However, hypothetical unfair tests aren't very informative about what is a good decision theory, because it's trivial to cook one up that favours one theory and disfavours another. I think the hope was to invent a decision theory that does well on all fair tests; the example above seems to show that may not be possible.

Comment author: 23 May 2012 04:26:37PM 2 points [-]

Not exactly. Because the problem statement says that it simulates "TDT", if you were to expand the problem statement out into code it would have to contain source code to a complete instantiation of TDT. When the problem statement is run, TDT or TDT-prime can look at that instantiation and compare it to its own source code. TDT will see that they're the same, but TDT-prime will notice that they are different, and thereby infer that it is not the simulated copy. (Any difference whatsoever is proof of this.)

Consider an alternative problem. Omega flips a coin, and asks you to guess what it was, with a prize if you guess correctly. If the coin was heads, he shows you a piece of paper with TDT's source code. If the coin was tails, he shows you a piece of paper with your source code, whatever that is.

Comment author: 23 May 2012 05:54:33PM *  11 points [-]

I'm not sure the part about comparing source code is correct. TDT isn't supposed to search for exact copies of itself, it's supposed to search for parts of the world that are logically equivalent to itself.

Comment author: 06 June 2012 12:05:55PM 0 points [-]

The key thing is the question as to whether it could have been you that has been simulated. If all you know is that you're a TDT agent and what Omega simulated is a TDT agent, then it could have been you. Therefore you have to act as if your decision now may either real or simulated. If you know you are not what Omega simulated (for any reason), then you know that you only have to worry about the 'real' decision.

Comment author: 06 June 2012 04:34:19PM 0 points [-]

Suppose that Omega doesn't reveal the full source code of the simulated TDT agent, but just reveals enough logical facts about the simulated TDT agent to imply that it uses TDT. Then the "real" TDT Prime agent cannot deduce that it is different.

Comment author: 19 June 2012 07:30:10AM *  0 points [-]

Yes. I think that as long as there is any chance of you being the simulated agent, then you need to one box. So you one box if Omega tells you 'I simulated some agent', and one box if Omega tells you 'I simulated an agent that uses the same decision procedure as you', but two box if Omega tells you 'I simulated an agent that had a different copywrite comment in its source code to the comment in your source code'.

This is just a variant of the 'detect if I'm in a simulation' function that others mention. i.e. if Omega gives you access to that information in any way, you can two box. Of course, I'm a bit stuck on what Omega has told the simulation in that case. Has Omega done an infinite regress?

Comment author: 06 June 2012 03:57:44PM 0 points [-]

That's an interesting way to look at the problem. Thanks!

Comment author: 23 May 2012 02:28:04PM 1 point [-]

You can see that something funny has hapened by postulating TDT-prime, which is identical to TDT except that Omega doesn't recognize it as a duplicate (eg, it differs in some way that should be irrelevant). TDT-prime would two-box, and win.

I don't think so. If TDT-prime two boxes, the TDT simulation two-boxes, so only one box is full, so TDT-prime walks away with \$1000. Omega doesn't check what decision theory you're using at all - it just simulates TDT and bases its decision on that. I do think that this ought to fall outside a rigorously defined class of "fair" problems, but it doesn't matter whether Omega can recognise you as a TDT-agent or not.

Comment author: 23 May 2012 02:30:47PM 2 points [-]

I don't think so. If TDT-prime two boxes, the TDT simulation two-boxes, so only one box is full, so TDT-prime walks away with \$1000.

No, if TDT-prime two boxes, the TDT simulation still one-boxes.

Comment author: 23 May 2012 02:39:16PM 6 points [-]

Hmm, so TDT-prime would reason something like, "The TDT simulation will one-box because, not knowing that it's the simulation, but also knowing that the simulation will use exactly the same decision theory as itself, it will conclude that the simulation will do the same thing as itself and so one-boxing is the best option. However, I'm different to the TDT-simulation, and therefore I can safely two-box without affecting its decision." In which case, does it matter how inconsequential the difference is? Yep, I'm confused.

Comment author: 25 December 2012 04:07:16PM *  -1 points [-]

Yep, I'm confused.

Sounds like you have it exactly right.

Comment author: 23 May 2012 03:34:34PM 2 points [-]

I also had thoughts along these lines - variants of TDT could logically separate themselves, so that T-0 one-boxes when it is simulated, but T-1 has proven that T-0 will one-box, and hence T-1 two-boxes when T-0 is the sim.

But a couple of difficulties arise. The first is that if TDT variants can logically separate from each other (i.e. can prove that their decisions aren't linked) then they won't co-operate with each other in Prisoner's Dilemma. We could end up with a bunch of CliqueBots that only co-operate with their exact clones, which is not ideal.

The second difficulty is that for each specific TDT variant, one with algorithm T' say, there will be a specific problematic problem on which T' will do worse than CDT (and indeed worse than all the other variants of TDT) - this is the problem with T' being the exact algorithm running in the sim. So we still don't get the - desirable - property that there is some sensible decision theory called TDT that is optimal across fair problems.

The best suggestion I've heard so far is that we try to adjust the definition of "fairness", so that these problematic problems also count as "unfair". I'm open to proposals on that one...

Comment author: 04 June 2012 11:39:19PM 0 points [-]

But a couple of difficulties arise. The first is that if TDT variants can logically separate from each other (i.e. can prove that their decisions aren't linked) then they won't co-operate with each other in Prisoner's Dilemma. We could end up with a bunch of CliqueBots that only co-operate with their exact clones, which is not ideal.

I think this is avoidable. Let's say that there are two TDT programs called Alice and Bob, which are exactly identical except that Alice's source code contains a comment identifying it as Alice, whereas Bob's source code contains a comment identifying it as Bob. Each of them can read their own source code. Suppose that in problem 1, Omega reveals that the source code it used to run the simulation was Alice. Alice has to one-box. But Bob faces a different situation than Alice does, because he can find a difference between his own source code and the one Omega simulated, whereas Alice could not. So Bob can two-box without effecting what Alice would do.

However, if Alice and Bob play the prisoner's dilemma against each other, the situation is much closer to symmetric. Alice faces a player identical to itself except with the "Alice" comment replaced with "Bob", and Bob faces a player identical to itself except with the "Bob" comment replaced with "Alice". Hopefully, their algorithm would compress this information down to "The other player is identical to me, but has a comment difference in its source code", at which point each player would be in an identical situation.

Comment author: 25 December 2012 04:13:32PM -1 points [-]

However, if Alice and Bob play the prisoner's dilemma against each other, the situation is much closer to symmetric. Alice faces a player identical to itself except with the "Alice" comment replaced with "Bob", and Bob faces a player identical to itself except with the "Bob" comment replaced with "Alice". Hopefully, their algorithm would compress this information down to "The other player is identical to me, but has a comment difference in its source code", at which point each player would be in an identical situation.

Why doesn't that happen when dealing with Omega?

Comment author: 25 December 2012 08:01:22PM 0 points [-]

Because if Omega uses Alice's source code, then Alice sees that the source code of the simulation is exactly the same as hers, whereas Bob sees that there is a comment difference, so the situation is not symmetric.

Comment author: 25 December 2012 10:21:11PM 0 points [-]

So why doesn't that happen in the prisoner's dilemma?

Comment author: 25 December 2012 10:47:57PM 0 points [-]

Because Alice sees that Bob's source code is the same as hers except for a comment difference, and Bob sees that Alice's source code is the same as his except for a comment difference, so the situation is symmetric.

Comment author: 09 June 2012 11:24:08AM 1 point [-]

You might want to look at my follow-up article which discusses a strategy like this (among others). It's worth noting that slight variations of the problem remove the opportunity for such "sneaky" strategies.

Comment author: 09 June 2012 08:46:14PM 0 points [-]

Ah, thanks. I had missed that, somehow.

Comment author: 06 June 2012 12:12:51PM *  0 points [-]

In a prisoners dilemma Alice and Bob affect each others outcomes. In the newcomb problem, Alice affects Bobs outcome, but Bob doesn't affect Alices outcome. That's why it's OK for Bob to consider himself different in the second case as long as he knows he is definitely not Alice (because otherwise he might actually be in a simulation) but not OK for him to consider himself different in the prisoners dilemma.

Comment author: 23 May 2012 08:14:02PM 0 points [-]

The right place to introduce the separation is not in between TDT and TDT-prime, but in between TDT-prime's output and TDT-prime's decision. If its output is a strategy, rather than a number of boxes, then that strategy can include a byte-by-byte comparison; and if TDT and TDT-prime both do it that way, then they both win as much as possible.

Comment author: 24 May 2012 12:08:43PM 0 points [-]

Can all the TDT variants adopt a common strategy, but with different execution results, depending on source-code self-inspection and sim-inspection? Can that approach really work in general without creating CliqueBots? Don't know yet without detailed analysis.

Another issue is that Omega is not obliged to reveal the source-code of the sim; it could instead provide some information about the method used to generate / filter the sim code (e.g. a distribution the sim was drawn from) and still lead to a well-defined problem. Each TDT variant would not then know whether it was the sim.

I'm aiming for a follow-up article addressing this strategy (among others).

Comment author: 24 May 2012 05:57:56PM 0 points [-]

Can all the TDT variants adopt a common strategy, but with different execution results, depending on source-code self-inspection and sim-inspection?

This sounds equivalent to asking "can a turing machine generate non-deterministically random numbers?" Unless you're thinking about coding TDT agents one at a time and setting some constant differently in each one.

Comment author: 23 May 2012 08:25:17PM 1 point [-]

But doesn't that make cliquebots, in general?

Comment author: 23 May 2012 04:22:55PM *  0 points [-]

Well, I've had a think about it, and I've concluded that it would matter how great the difference between TDT and TDT-prime is. If TDT-prime is almost the same as TDT, but has an extra stage in its algorithm in which it converts all dollar amounts to yen, it should still be able to prove that it is isomorphic to Omega's simulation, and therefore will not be able to take advantage of "logical separation".

But if TDT-prime is different in a way that makes it non-isomorphic, i.e. it sometimes gives a different output given the same inputs, that may still not be enough to "separate" them. If TDT-prime acts the same as TDT, except when there is a walrus in the vicinity, in which case it tries to train the walrus to fight crime, it is still the case in this walrus-free problem that it makes exactly the same choice as the simulation (?). It's as if you need the ability to prove that two agents necessarily give the same output for the particular problem you're faced with, without proving what output those agents actually give, and that sure looks crazy-hard.

EDIT: I mean crazy-hard for the general case, but much, much easier for all the cases where the two agents are actually the same.

EDIT 2: On the subject of fairness, my first thoughts: A fair problem is one in which if you had arrived at your decision by a coin flip (which is as transparently predictable as your actual decision process - i.e. Omega can predict whether it's going to come down heads or tails with perfect accuracy), you would be rewarded or punished no more or less than you would be using your actual decision algorithm (and this applies to every available option).

EDIT 3: Sorry to go on like this, but I've just realised that won't work in situations where some other agent bases their decision on whether you're predicting what their decision will be, i.e. Prisoner's Dilemma.

Comment author: 23 May 2012 11:41:30AM *  15 points [-]

Consider Problem 3: Omega presents you with two boxes, one of which contains \$100, and says that it just ran a simulation of you in the present situation and put the money in the box the simulation didn't choose.

This is a standard diagonal construction, where the environment is set up so that you are punished for the actions you choose, and rewarded for those of don't choose, irrespective of the actions. This doesn't depend on the decision algorithm you're implementing. A possible escape strategy is to make yourself unpredictable to the environment. The difficulty would also go away if the thing being predicted wasn't you, but something else you could predict as well (like a different agent that doesn't simulate you).

Comment author: 23 May 2012 11:57:12AM 8 points [-]

The correct solution to this problem is to choose each box with equal probability; this problem is the reason why decision theories have to be non-deterministic. It comes up all the time in real life: I try and guess what safe combination you chose, try that combination, and if it works I take all your money. Or I try to guess what escape route you'll use and post all the guards there.

What's interesting about Problem 2 is that it makes what would be the normal game-theoretic strategy unstable by choosing deterministically where the probabilities are exactly equal.

Comment author: 23 May 2012 12:28:15PM 4 points [-]

this problem is the reason why decision theories have to be non-deterministic. It comes up all the time in real life: I try and guess what safe combination you chose, try that combination, and if it works I take all your money.

Of course, you can just set up the thought experiment with the proviso that "be unpredictable" is not a possible move - in fact that's the whole point of Omega in these sorts of problems. If Omega's trying to break into your safe, he takes your money. In Nesov's problem, if you can't make yourself unpredictable, then you win nothing - it's not even worth your time to open the box. In both cases, a TDT agent does strictly as well as it possibly could - the fact that there's \$100 somewhere in the vicinity doesn't change that.

Comment author: 23 May 2012 11:14:04AM 4 points [-]

There's a different version of these problems for each decision theory, depending on what Omega simulates. For CDT, all agents two-box and all agents get \$1000. However, on problem 2, it seems like CDT doesn't have a well-defined decision at all; the effort to work out what Omega's simulator will say won't terminate.

(I'm spamming this post with comments - sorry!)

Comment author: 23 May 2012 12:16:59PM *  2 points [-]

You raise an interesting question here - what would CDT do if a CDT agent were in the simulation?

It looks to me that CDT just doesn't have the conceptual machinery to handle this problem properly, so I don't really know. One thing that could happen is that the simulated CDT agent tries to simulate itself and gets stuck in an infinite loop. I didn't specify exactly what would happen in that case, but if Omega can prove that the simulated agent is caught in a loop, then it knows the sim will choose each box with probability zero, and so (since these are all equal), it will fill box 1. But now can a real-life CDT agent also work this out, and beat the game by selecting box 1. But if so, why won't the sim do that, and so on? Aargh !!!

Another thought I had is that CDT could try tossing a logical coin, like computing the googleth digit of pi, and if it is even choose box 1, whereas if it is odd, choose box 2. If it runs out of time before computing (which the real-life agent will do), then it just picks box 1 or 2 with equal probability. The simulated CDT agent will however get to the end of the computation (Omega has arbitrary computational resources) and definitely pick 1 or 2 with certainty, so the money is definitely in one of those two boxes, which looks like the probability of the actual agent winning is raised to 50%. TDT might do the same.

However this looks like cheating to me, for both CDT and TDT.

EDIT: On reflection, it seems clear that CDT would never do anything "creatively sneaky" like tossing a logical coin; but it is the sort of approach that TDT (or some variant thereof) might come up with. Though I still think it's cheating.

Comment author: 23 May 2012 03:34:15PM *  1 point [-]

The version of CDT that I described explicitly should arrive at the uniformly random solution. You don't have to be able to simulate a program all the way through, just able to prove things about its output.

EDIT: Wait, this is wrong. It won't be able to consistently derive an answer, because of the way it acts given such an answer, and so it will go with whatever its default Nash equilibrium is.

Comment author: 23 May 2012 03:58:16PM 1 point [-]

Re: your EDIT. Yes, I've had that sort of reaction a couple of times today!

I'm shifting around between "CDT should pick at random, no CDT should pick Box 1, no CDT should use a logical coin, no CDT should pick it's favourite number in the set {1, 2} with probability 1, and hope that the version in the sim has a different favourite number, no, CDT will just go into a loop or collapse in a heap."

I'm also quite clueless how a TDT is supposed to decide if it's told there's a CDT in the sim... This looks like a pretty evil decision problem in its own right.

Comment author: 23 May 2012 06:15:00PM 1 point [-]

Well, the thing is that CDT doesn't completely specify a decision theory. I'm confident now that the specific version of CDT that I described would fail to deduce anything and go with its default, but it's hard to speak for CDTs in general on such a self-referential problem.

Comment author: 23 May 2012 03:34:14PM 2 points [-]

I don't think your "detect infinite resources and cheat" strategy is really worth thinking about. Instead of strategies like CDT and TDT whose applicability to limited compute resources is unclear, suppose you have an anytime strategy X, which you can halt at any time and get a decision. Then there's really a family of algorithms X-t, where t is the time you're going to give it to run. In this case, if you are X-t, we can consider the situation where Omega fields X-t against you.

Comment author: 23 May 2012 11:03:55AM 21 points [-]

I think we could generalise problem 2 to be problematic for any decision theory XDT:

There are 10 boxes, numbered 1 to 10. You may only take one. Omega has (several times) run a simulated XDT agent on this problem. It then put a prize in the box which it determined was least likely to be taken by such an agent - or, in the case of a tie, in the box with the lowest index.

If agent X follows XDT, it has at best a 10% chance of winning. Any sufficiently resourceful YDT agent, however, could run a simulated XDT agent themselves, and figure out what Omega's choice was without getting into an infinite loop.

Therefore, YDT performs better than XDT on this problem.

If I'm right, we may have shown the impossibility of a "best' decision theory, no matter how meta you get (in a close analogy to Godelian incompleteness). If I'm wrong, what have I missed?

Comment author: 25 December 2012 04:24:51PM -1 points [-]

If I'm right, we may have shown the impossibility of a "best' decision theory, no matter how meta you get (in a close analogy to Godelian incompleteness). If I'm wrong, what have I missed?

You're right. However, since all decision theories fail when confronted with their personal version of this problem, but may or may not fail in other problems, then some decision theories may be better than others. The one that is better than all the others is thus the "best" DT.

Comment author: 30 May 2012 09:27:02PM *  1 point [-]

To draw out the analogy to Godelian incompleteness, any computable decision theory is subject to the suggested attack of being given a "Godel problem'' like problem 1, just as any computable set of axioms for arithmetic has a Godel sentence. You can always make a new decision theory TDT' that is TDT+ do the right thing for the Godel problem. But TDT' has it's own Godel problem of course. You can't make a computable theory that says "do the right thing for all Godel probems", if you try to do that it would not give you something computable. I'm sure this is all just restating what you had in mind, but I think it's worth spelling out.

If you have some sort of oracle for the halting problem (i.e. a hypercomputer) and Omega doesn't, he couldn't simulate you, so you would presumably be able to always win fair problems. Otherwise the best thing you could hope for is to get the right answer whenever your computation halts, but fail to halt in your computation for some problems, such as your Godel problem. (A decision theory like this can still be given a Godel problem if Omega can solve the halting problem, "I simulated you and if you fail to halt on this problem..."). I wonder if TDT fails to halt for its Godel problem, or if some natural modification of it might have this property, but I don't understand it well enough to guess.

I am less optimistic about revising "fair" to exclude Godel problems. The analogy would be proving Peano arithmetic is complete "except for things that are like Godel sentences." I don't know of any formalizations of the idea of "being a Godel sentence".

Comment author: 23 May 2012 11:28:34PM *  3 points [-]

If I'm right, we may have shown the impossibility of a "best' decision theory, no matter how meta you get (in a close analogy to Godelian incompleteness). If I'm wrong, what have I missed?

I would say that any such problem doesn't show that there is no best decision theory, it shows that that class of problem cannot be used in the ranking.

Edited to add: Unless, perhaps, one can show that an instantiation of the problem with particular choice of (in this case decision theory, but whatever is varied) is particularly likely to be encountered.

Comment author: 23 May 2012 11:33:55AM *  9 points [-]

You're right about problem 2 being a fully general counterargument, but your philosophical conclusion seems to be stopping too early. For example, can we define a class of "fair" problems that excludes problem 2?

Comment author: 23 May 2012 10:36:28PM 1 point [-]

It looks like the issue here is that while Omega is ostensibly not taking into account your decision theory, it implicitly is by simulating an XDT agent. So a first patch would be to define simulations of a specific decision theory (as opposed to simulations of a given agent) as "unfair".

On the other hand, we can't necessarily know if a given computation is effectively equivalent to simulating a given decision theory. Even if the string "TDT" is never encoded anywhere in Omega's super-neurons, it might still be simulating a TDT agent, for example.

On the first hand again, it might be easy for most problems to figure out whether anyone is implicitly favouring one DT over another, and thus whether they're "fair".

Comment author: 23 May 2012 12:11:09PM 2 points [-]

One possible place to look is that we're allowing Omega access not just to a particular simulated decision of TDT, but to the probabilities with which it makes these decisions. If we force it to simulate TDT many times and sample to learn what the probabilities are, it can't detect the exact balance for which it does deterministic symmetry breaking, and the problem goes away.

This solution occurred to me because this forces Omega to have something like a continuous behaviour response to changes in the probabilities of different TDT outputs, and it seems possible given that to imagine a proof that a fixed point must exist.

Comment author: 23 May 2012 01:20:26PM 0 points [-]

Fair point - how does Omega tell when the sim's choosing probabilities are exactly equal? Well I was thinking that Omega could prove they are equal (by analysing the simulation's behaviour, and checking where it calls on random bits). Or if it can't do that, then it can just check that the choice frequencies are "statistically equal" (i.e. no significant differences after a billion runs, say) and treat them as equal for the tie-breaker rule. The "statistically equal" approach might give the TDT agent a very slightly higher than 10% chance of winning the money, though I haven't analysed this in any detail.

Comment author: 23 May 2012 10:26:31PM 0 points [-]

If the subject can know the exact code of TDT, Omega can know the exact code of TDT, and analyse it however it likes. That means it can know exactly where randomness is invoked - why would it have to sample?

Comment author: 24 May 2012 11:59:02AM 1 point [-]

This was my first thought: Omega can just prove the choosing probabilities are equal. However, it's not totally straightforward, because the sim could sample more random bits depending on the results of its first random bits, and so on, leading to an exponentially growing outcome tree of possibilities, with no upper size bound to the length of the tree. There might not be an easy proof of equality in that case. Sampling and statistical equality is the next best approach...

Comment author: 23 May 2012 10:37:18AM 12 points [-]

BTW, general question about decision theory. There appears to have been an academic study of decision theory for over a century, and causal and evidential decision theory were set out in 1981. Newcomb's paradox was set out in 1969. Yet it seems as though no-one thought to explore the space beyond these two decision theories until Eliezer proposed TDT, and it seems as if there is a 100% disconnect between the community exploring new theories (which is centered around LW) and the academic decision theory community. This seems really, really odd - what's going on?

Comment author: 23 May 2012 07:47:30PM 8 points [-]

There were plenty of previous theories trying to go beyond CDT or EDT, they just weren't satisfactory.

Comment author: 24 May 2012 09:36:36PM *  5 points [-]

This paper talks about reflexive decision models and claims to develop a form of CDT which one boxes.

It's in my to-read list but I haven't got to it yet so I'm not sure whether it's of interest but I'm posting it just in case (it could be a while until I have time to read it so I won't be able to post a more informed comment any time soon).

Though this theory post-dates TDT and so isn't interesting from that perspective.

Comment author: 24 May 2012 07:49:00PM *  2 points [-]

Dispositional decision theory :P

... which I cannot find a link to the paper for, now. Hm. But basically it was just TDT, with less awareness of why.

EDIT: Ah, here it was. Credit to Tim Tyler.

Comment author: 24 May 2012 08:27:52PM 2 points [-]

I checked it. Not the same thing.

Comment author: 23 May 2012 04:29:22PM *  0 points [-]

(Deleting comments seems not to be working. Consider this a manual delete.)

Comment author: 23 May 2012 04:50:21PM 1 point [-]

Decision Theory is and can be applied to a variety of problems here. It's just that AI may face Newcomb-like problems and in particular we want to ensure a 1-boxing-like behavior on the part of AI.

Comment author: 24 May 2012 12:29:59PM 0 points [-]

What is an example of such a real-world problem?

Comment author: 24 May 2012 06:09:13PM 4 points [-]

Negotiations with entities who can read the AI's source code.

Comment author: 03 June 2012 06:35:06PM -2 points [-]

Given the week+ delay in this response, it's probably not going to see much traffic, but I'm not convinced "reading" source code is all that helpful. Omega is posited to have nearly god-like abilities in this regard, but since this is a rationalist discussion, we probably have to rule out actual omnipotence.

If Omega intends to simply run the AI on spare hardware it has, then it has to be prepared to validate (in finite time and memory) that the AI hasn't so obfuscated its source as to be unintelligible to rational minds. It's also possible that the source to an AI is rather simple but it is dependent a large amount of input data in the form of a vast sea of numbers. I.e., the AI in question could be encoded as an ODE system integrator that's reliant on a massive array of parameters to get from one state to the next. I don't see why we should expect Omega to be better at picking out the relevant, predictive parts of these numbers than we are.

If the AI can hide things in its code or data, then it can hide functionality that tests to determine if it is being run by Omega or on its own protected hardware. In such a case it can lie to Omega just as easily as Omega can lie to the "simulated" version of the AI.

I think it's time we stopped positing an omniscient Omega in these complications to Newcomb's problem. They're like epicycles on Ptolemaic orbital theory in that they continue a dead end line of reasoning. It's better to recognize that Newcomb's problem is a red herring. Newcomb's problem doesn't demonstrate problems that we should expect AI's to solve in the real world. It doesn't tease out meaningful differences between decision theories.

That is, what decisions on real-world problems do we expect to be different between two AIs that come to different conclusions about Newcomb-like problems?

Comment author: 03 June 2012 07:00:25PM *  2 points [-]

You should note that every problem you list is a special case. Obviously, there are ways of cheating at Newcomb's problem if you're aware of salient details beforehand. You could simply allow a piece of plutonium to decay, and do whatever the resulting Geiger counter noise tells you to. That does not, however, support your thesis that Newcomb's problem is a totally artificial problem with no logical intrusions into reality.

As a real-world example, imagine an off-the-shelf stock market optimizing AI. Not sapient, to make things simpler, but smart. When any given copy begins running, there are already hundreds or thousands of near-identical copies running elsewhere in the market. If it fails to predict their actions from its own, it will do objectively worse than it might otherwise do.

Comment author: 04 June 2012 02:57:36AM -1 points [-]

i don't see how your example is apt or salient. My thesis is that Newcomb-like problems are the wrong place to be testing decision theories because they do not represent realistic or relevant problems. We should focus on formalizing and implementing decision theories and throw real-world problems at them rather than testing them on arcane logic puzzles.

Comment author: 04 June 2012 03:24:51AM 3 points [-]

Well... no, actually. A good decision theory ought to be universal. It ought to be correct, and it ought to work. Newcomb's problem is important, not because it's ever likely to happen, but because it shows a case in which the normal, commonly accepted approach to decision theory (CDT) failed miserably. This 'arcane logic puzzle' is illustrative of a deeper underlying flaw in the model, which needs to be addressed. It's also a flaw that'd be much harder to pick out by throwing 'real world' problems at it over and over again.

Comment author: 04 June 2012 12:54:49PM -1 points [-]

Seems unlikely to work out to me. Humans evolved intelligence without Newcomb-like problems. As the only example of intelligence that we know of, it's clearly possible to develop intelligence without Newcomb-like problems. Furthermore, the general theory seems to be that AIs will start dumber than humans and iteratively improve until they're smarter. Given that, why are we so interested in problems like these (which humans don't universally agree about the answers to)?

I'd rather AIs be able to help us with problems like "what should we do about the economy?" or even "what should I have for dinner?" instead of worrying about what we should do in the face of something godlike.

Additionally, human minds aren't universal (assuming that universal means that they give the "right" solutions to all problems), so why should we expect AIs to be? We certainly shouldn't expect this if we plan on iteratively improving our AIs.

Comment author: 23 May 2012 07:09:45PM *  3 points [-]

The rationale for TDT-like decision theories is even more general, I think. There's no guarantee that our world contains only one copy of something. We want a decision theory that would let the AI cooperate with its copies or logical correlates, rather than wage pointless wars.

Comment author: 23 May 2012 09:12:16PM *  2 points [-]

We want a decision theory that would let the AI cooperate with its copies or logical correlates, rather than wage pointless wars.

Constructing rigorous mathematical foundation of decision theory to explain what a decision problem or a decision or a goal are, is potentially more useful than resolving any given informally specified class of decision problems.

Comment author: 23 May 2012 01:47:36PM 3 points [-]

It should be noted that Newcomb's problem was considered interesting in Philosophy in 1969, but decision theories were studied more in other fields - so there's a disconnect between the sorts of people who usually study formal decision theories and that sort of problem.

Comment author: 23 May 2012 12:59:51PM 12 points [-]

Yet it seems as though no-one thought to explore the space beyond these two decision theories until Eliezer proposed TDT...

This is simply not true. Robert Nozick (who introduced Newcomb's problem to philosophers) compared/contrasted EDT and CDT at least as far back as 1993. Even back then, he noted their inadequacy on several decision-theoretic problems and proposed some alternatives.

Comment author: 23 May 2012 01:14:53PM 4 points [-]

Me being ignorant of something seemed like a likely part of the explanation - thanks :) I take it you're referencing "The Nature of Rationality"? Not read that I'm afraid. If you can spare the time I'd be interested to know what he proposes -thanks!

Comment author: 23 May 2012 01:49:20PM *  6 points [-]

I haven't read The Nature of Rationality in quite a long time, so I won't be of much help. For a very simple and short introduction to Nozick's work on decision theory, you should read this (PDF).