loup-vaillant comments on Problematic Problems for TDT - Less Wrong

36 Post author: drnickbone 29 May 2012 03:41PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (298)

You are viewing a single comment's thread.

Comment author: loup-vaillant 31 May 2012 08:49:30AM *  0 points [-]

Either problem 1 and 2 are hitting an infinite regress issue, or I don't see why an ordinary TDT agent wouldn't 2box, and choose the first box, respectively. There's a difference between the following problems:

  • I, Omega, predicted that you would do such and such, and acted accordingly.
  • I, Omega, simulated another agent, and acted accordingly.
  • I, Omega, simulated this very problem, only if you don't run TDT that's not the same problem, but I promise it's the same nonetheless, and acted accordingly

Now, in problem 1 and 2, are the simulated problem and the actual problem actually the same? If they are, I see an infinite regress at Omega's side, and therefore not a problem one would ever encounter. If they aren't, then what I actually understand them to be is:

  1. Omega presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of Newcomb's problem as presented to an agent running TDT. If the agent two-boxed then I put nothing in Box B, whereas if the agent one-boxed then I put $1 million in Box B. Regardless of how the simulated agent decided, I put $1000 in Box A. Now please choose your box or boxes."

    Really, You don't have to use something else than TDT to see that the simulated TDT agent one boxed. Its problem isn't your problem. Your precomittment to your problem doesn't affect your precommitment to its problem. Of course, the simulated TDT agent did the right choice by 1 boxing. But you should 2 box.

  2. Omega now presents ten boxes, numbered from 1 to 10, and announces the following. "I ran multiple simulation of the following problem, presented to a TDT agent: “You must take exactly one box. I determined which box you are least likely to take, and put $1million in that box. If there is a tie, I put the money in one of them (the one labelled with the lowest number).” I put the money in the box the simulated TDT agent were least likely to choose. If there was a tie, I put the money in one of them (the one labelled with the lowest number). Now choose your box."

    Same here. You know that the TDT agent put equal probability on every box, to maximize its gains. Again, its problem isn't your problem. Your precomittment to your problem doesn't affect your precommitment to its problem. Of course, the simulated TDT agent did the right choice by choosing at random. But you should take box 1.

Comment author: tut 19 June 2012 01:55:33PM 0 points [-]

You don't have to use something else than TDT to see that the simulated TDT agent one boxed. Its problem isn't your problem.

This is CDT reasoning, AKA causal reasoning. Or in other words, how do you not use the same reasoning in the original Newcombe problem?

Comment author: loup-vaillant 21 June 2012 10:16:16PM *  -1 points [-]

The reasoning is different because the problem is different.

The simulated agent and yourself were not subjected to the same problem. Therefore you can perfectly precommit to different decisions. TDT does not automatically take the same decisions to problems that merely kinda look the same. They have to actually be the same. There may be specific reasons why TDT would make the same decision, but I doubt it.

Now on to the examples:

Newcomb's problem

Omega ran a simulation of Newcomb's problem, complete with a TDT agent in it. The simulated TDT agent obviously one boxed, and got the million. If you run TDT yourself, you also know it. Now, Omega tells you of this simulation, and tells you to chose your boxes. This is not Newcomb's problem. If it was, deciding to 2 box would cause box B to be empty!

CDT would crudely assume that 2 boxing gets it $1000 more than 1 boxing. TDT on the other hand knows the simulated box B (and therefore the real one as well) has the million, regardless of its current decision.

10 boxes problem

Again, the simulated problem and the real one aren't the same. If there were, choosing box 1 with probability 1 would cause box 2 to have the million. Because it's not the same problem, even TDT should be allowed to precommit different decision. The point of TDT is to foresee the consequences of its precommitments. It will therefore know that its precommitment in the real problem doesn't have any influence to its precommitment (and therefore the outcome) in the simulated one. This lack of influence allows it to fall back on CDT reasoning.

Makes sense?

Comment author: lackofcheese 22 June 2012 12:54:50AM 0 points [-]

The simulated problem and the actual problem don't have to actually be the same - just indistinguishable from the point of view of the agent.

Omega avoids infinite regress because the actual contents of the boxes are irrelevant for the purposes of the simulation, so no sub-simulation is necessary.

Comment author: loup-vaillant 22 June 2012 09:15:47AM 0 points [-]

Okay. So, what specific mistake TDT does that would prevent it to distinguish the two problems? What does it lead it to think "If I precommit X in problem 1, I have to precommit X in problem 2 as well".

(If the problems aren't the same, of course Omega can avoid infinite regress. And if there is unbounded regress, we may be able to find a non-infinite solution by looping the regress over itself. But then the problems (simulated an real) are definitely the same.)

Comment author: lackofcheese 22 June 2012 11:51:57AM 0 points [-]

In the simulated problem the simulated agent is presented with the choice but never gets the reward; for all it matters both boxes can be empty. This means that Omega doesn't have to do another simulation to work out what's in the simulated boxes.

The infinite regress is resolvable anyway - since each TDT agent is facing the exact same problem, their decisions must be identical, hence TDT one-boxes and Omega knows this.

Comment author: loup-vaillant 22 June 2012 12:20:19PM 0 points [-]

The infinite regress is resolvable anyway - since each TDT agent is facing the exact same problem, their decisions must be identical, hence TDT one-boxes and Omega knows this.

<Slaps forehead> Of course.

Now there's still the question of the perceived difference between the simulated problem and the real one (I assume here that you should 1 box in the simulation, and 2 box in the real problem). There is a difference, how come TDT does not see it? A Rational Decision Theory would —we humans do. Or if it can see it, how come can't it act on it? RDT could. Do you concede that TDT does and can, or do you still have doubts?

Comment author: lackofcheese 23 June 2012 12:19:36AM 2 points [-]

Due to how the problem is set up, you can't notice the difference until after you've made your decision. The only reason other decision theories know they're not in the simulation is because the problem explicitly states that a TDT agent is simulated, which means it can't be them.

Comment author: loup-vaillant 24 June 2012 08:04:56PM *  0 points [-]

The only reason other decision theories know they're not in the simulation is because the problem explicitly states that a TDT agent is simulated, which means it can't be them.

That's false. Here is a modified version of the problem:

Omega presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of Newcomb's problem as presented to you. If your simulated twin 2-boxed then I put nothing in Box B. If your simulated twin 1-boxed, I put $1 million in Box B. In any case, I put $1000 in Box A. Now please 1-box or 2-box."

Even if you're not running TDT, the simulated agent is running the same decision algorithm as you are. If that was the reason why TDT couldn't tell the difference, well, now no one can. However you and I can make the difference. The simulated problem is obviously different:

Omega presents the usual two boxes A and B and announces the following. "I am subjecting you to Newcomb's problem. Now please 1-box or 2-box".

Really, the subjective difference between the two problems should be obvious to any remotely rational agent.

(Please let me know if you agree up until that point. Below, I assume you do.)

I'm pretty sure the correct answers for the two problems (my modified version as well as the original one) are 1-box in the simulation, 2-box in the real problem. (Do you still agree?)

So. We both agree that RDT (Rational Decision Theory) 1-boxes in the simulation, and 2-boxes in the real problem. CDT would 2-box in both, and TDT would 1-box in the simulation while in the real problem it would…

  • 2-box? I think so.
  • 1-box? Supposedly because it can't tell simulation from reality. Or rather, it can't tell the difference between Newcomb's problem and the actual problem. Even though RDT does. (riiight?) So again, I must ask, why not? I need a more specific answer than "due to how the problem is set up". I need you to tell me what specific kind of irrationality TDT is committing here. I need to know its specific blind spot.
Comment author: lackofcheese 24 June 2012 11:04:10PM *  1 point [-]

In your problem, TDT does indeed 2-box, but it's quite a different problem from the original one. Here's the main difference:

I ran a simulation of this problem

vs

I ran a simulation of Newcomb's problem

Comment author: APMason 24 June 2012 08:59:33PM 0 points [-]

Well, in the problem you present here TDT would 2-box, but you've avoided the hard part of the problem from the OP, in which there is no way to tell whether you're in the simulation or not (or at least there is no way for the simulated you to tell), unless you're running some algorithm other than TDT.

Comment author: MugaSofer 25 December 2012 03:53:04PM *  -1 points [-]

The simulated agent and yourself were not subjected to the same problem.

Um, yes, they were. That's the whole point.

Comment author: loup-vaillant 31 December 2012 07:40:07PM 0 points [-]

I'll need to write a full discussion post about that at some point. There is one crucial difference besides "I'm TDT" and "I'm CDT". It's "The simulated agent uses the same decision theory" and "The simulated agent does not use the same decision theory".

That's not exactly the same problem, and I think that is the whole point.