jimrandomh comments on Problematic Problems for TDT - Less Wrong

36 Post author: drnickbone 29 May 2012 03:41PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (298)

You are viewing a single comment's thread.

Comment author: jimrandomh 23 May 2012 02:14:20PM *  29 points [-]

You can construct a "counterexample" to any decision theory by writing a scenario in which it (or the decision theory you want to have win) is named explicitly. For example, consider Alphabetic Decision Theory, which writes a description of each of the options, then chooses whichever is first alphabetically. ADT is bad, but not so bad that you can't make it win: you could postulate an Omega which checks to see whether you're ADT, gives you $1000 if you are, and tortures you for a year if you aren't.

That's what's happening in Problem 1, except that it's a little bit hidden. There, you have an Omega which says: if you are TDT, I will make the content of these boxes depend on your choice in such a way that you can't have both; if you aren't TDT, I filled both boxes.

You can see that something funny has hapened by postulating TDT-prime, which is identical to TDT except that Omega doesn't recognize it as a duplicate (eg, it differs in some way that should be irrelevant). TDT-prime would two-box, and win.

Comment author: ciphergoth 23 May 2012 03:09:44PM *  20 points [-]

Right, but this is exactly the insight of this post put another way. The possibility of an Omega that rewards eg ADT is discussed in Eliezer's TDT paper. He sets out an idea of a "fair" test, which evaluates only what you do and what you are predicted to do, not what you are. What's interesting about this is that this is a "fair" test by that definition, yet it acts like an unfair test.

Because it's a fair test, it doesn't matter whether Omega thinks TDT and TDT-prime are the same - what matters is whether TDT-prime thinks so.

Comment author: Jack 23 May 2012 10:06:14PM 4 points [-]

He sets out an idea of a "fair" test, which evaluates only what you do and what you are predicted to do, not what you are.

Two questions: First, how does is this distinction justified? What a decision theory is is a strategy for responding to decision tasks and simulating agents performing the right decision tasks tells you what kind of decision theory they're using. Why does it matter if it's done implicitly (as in Newcomb's discrimination against CDT) or explicitly. And second why should we care about it? Why is it important for a decision theory to pass fair tests but not unfair tests?

Comment author: APMason 24 May 2012 10:47:29AM 7 points [-]

Why is it important for a decision theory to pass fair tests but not unfair tests?

Well, on unfair tests a decision theory still needs to do as well as possible. If we had a version of the original Newcomb's problem, with the one difference that a CDT agent gets $1billion just for showing up, it's still incumbent upon a TDT agent to walk away with $1000000 rather than $1000. The "unfair" class of problems is that class where "winning as much as possible" is distinct from "winning the most out of all possible agents".

Comment author: ciphergoth 24 May 2012 06:50:28AM 4 points [-]

Real-world unfair tests could matter, though it's not clear if there are any. However, hypothetical unfair tests aren't very informative about what is a good decision theory, because it's trivial to cook one up that favours one theory and disfavours another. I think the hope was to invent a decision theory that does well on all fair tests; the example above seems to show that may not be possible.

Comment author: loup-vaillant 25 June 2012 07:16:55AM *  3 points [-]

Because it's a fair test

No, not even by Eliezer's standard, because TDT is not given the same problem than other decision theories.

As stated in comments below, everyone but TDT have the information "I'm not in the simulation" (or more precisely, in one of the simulations of the infinite regress that is implied by Omega's formulation). The reason TDT does not have this extra piece of information comes from the fact that it is TDT, not from any decision it may make.

Comment author: ciphergoth 25 June 2012 09:14:08AM 1 point [-]

Right, and this is an unfairness that Eliezer's definition fails to capture.

Comment author: loup-vaillant 25 June 2012 11:43:57AM 0 points [-]

At this point, I need the text of that definition.

Comment author: shokwave 25 June 2012 12:04:12PM 0 points [-]

The definition is in Eliezer's TDT paper although a quick grep for "fair" didn't immediately find the definition.

Comment author: APMason 25 June 2012 03:40:53PM *  0 points [-]

This variation of the problem was invented in the follow-up post (I think it was called "Sneaky strategies for TDT" or something like that:

Omega tells you that earlier he flipped a coin. If the coin came down heads, it simulated a CDT agent facing this problem. If the coin came down tails, it simulated a TDT agent facing this problem. In either case, if the simulated agent one-boxed, there is $1000000 in Box-B; if it two-boxed Box-B is empty. In this case TDT still one-boxes (50% chance of $1000000 dominates a 100% chance of $1000), and CDT still two-boxes (because that's what CDT does). In this case, even though both agents have an equal chance of being simulated, CDT out-performs TDT (average payoffs of 500500 vs. 500000) - CDT takes advantage of TDT's prudence and TDT suffers for CDT's lack of it. Notice also that TDT cannot do better by behaving like CDT (both would get payoffs of 1000). This shows that the class of problems we're concerned with is not so much "fair" vs. "unfair", but more like "those problem on which the best I can do is not necessarily the best anyone can do". We can call it "fairness" if we want, but it's not like Omega is discriminating against TDT in this case.

Comment author: loup-vaillant 25 June 2012 04:04:04PM *  3 points [-]

This is not a zero-sum game. CDT does not outperform TDT here. It just makes a stupid mistake, and happens to pay it less dearly than TDT

Let's say Omega submit the same problem to 2 arbitrary decision theories. Each will either 1-box or 2-box. Here is the average payoff matrix:

  • Both a and b 1-box -> They both get the million
  • Both a and b 2-box -> They both get 1000 only.
  • One 1-boxes, the other 2-boxes -> the 1-boxer gets half a million, the other gets 5000 more.

Clearly, 1 boxing still dominates 2-boxing. Whatever the other does, you personally get about half a million more by 1-boxing. TDT may have less utility than CDT for 1-boxing, but CDT is still stupid here, while TDT is not.

Comment author: jimrandomh 23 May 2012 04:26:37PM 2 points [-]

Not exactly. Because the problem statement says that it simulates "TDT", if you were to expand the problem statement out into code it would have to contain source code to a complete instantiation of TDT. When the problem statement is run, TDT or TDT-prime can look at that instantiation and compare it to its own source code. TDT will see that they're the same, but TDT-prime will notice that they are different, and thereby infer that it is not the simulated copy. (Any difference whatsoever is proof of this.)

Consider an alternative problem. Omega flips a coin, and asks you to guess what it was, with a prize if you guess correctly. If the coin was heads, he shows you a piece of paper with TDT's source code. If the coin was tails, he shows you a piece of paper with your source code, whatever that is.

Comment author: cousin_it 23 May 2012 05:54:33PM *  11 points [-]

I'm not sure the part about comparing source code is correct. TDT isn't supposed to search for exact copies of itself, it's supposed to search for parts of the world that are logically equivalent to itself.

Comment author: kybernetikos 06 June 2012 12:05:55PM 0 points [-]

The key thing is the question as to whether it could have been you that has been simulated. If all you know is that you're a TDT agent and what Omega simulated is a TDT agent, then it could have been you. Therefore you have to act as if your decision now may either real or simulated. If you know you are not what Omega simulated (for any reason), then you know that you only have to worry about the 'real' decision.

Comment author: JGWeissman 06 June 2012 04:34:19PM 0 points [-]

Suppose that Omega doesn't reveal the full source code of the simulated TDT agent, but just reveals enough logical facts about the simulated TDT agent to imply that it uses TDT. Then the "real" TDT Prime agent cannot deduce that it is different.

Comment author: kybernetikos 19 June 2012 07:30:10AM *  0 points [-]

Yes. I think that as long as there is any chance of you being the simulated agent, then you need to one box. So you one box if Omega tells you 'I simulated some agent', and one box if Omega tells you 'I simulated an agent that uses the same decision procedure as you', but two box if Omega tells you 'I simulated an agent that had a different copywrite comment in its source code to the comment in your source code'.

This is just a variant of the 'detect if I'm in a simulation' function that others mention. i.e. if Omega gives you access to that information in any way, you can two box. Of course, I'm a bit stuck on what Omega has told the simulation in that case. Has Omega done an infinite regress?

Comment author: cousin_it 06 June 2012 03:57:44PM 0 points [-]

That's an interesting way to look at the problem. Thanks!

Comment author: ewbrownv 11 June 2012 10:09:55PM 2 points [-]

Indeed. These are all scenarios of the form "Omega looks at the source code for your decision theory, and intentionally creates a scenario that breaks it." Omega could do this with any possible decision theory (or at last, anything that could be implemented with finite resources), so what exactly are we supposed to learn by contemplating specific examples?

It seems to me that the valuable Omega thought experiments are the ones where Omega's omnipotence is simply used to force the player to stick to the rules of the given scenario. When you start postulating that an impossible, acausal superintelligence is actively working agaisnt you it's time to hang up your hat and go home, because no strategy you could possibly come up with is going to do you any good.

Comment author: MugaSofer 24 December 2012 09:57:12PM 0 points [-]

The trouble is when another agent wins in this situation and in the situations you usually encounter. For example, an anti-traditional-rationalist, that always makes the opposite choice to a traditional rationalist, will one-box; it just fails spectacularly when asked to choose between different amounts of cake.

Comment author: APMason 23 May 2012 02:28:04PM 1 point [-]

You can see that something funny has hapened by postulating TDT-prime, which is identical to TDT except that Omega doesn't recognize it as a duplicate (eg, it differs in some way that should be irrelevant). TDT-prime would two-box, and win.

I don't think so. If TDT-prime two boxes, the TDT simulation two-boxes, so only one box is full, so TDT-prime walks away with $1000. Omega doesn't check what decision theory you're using at all - it just simulates TDT and bases its decision on that. I do think that this ought to fall outside a rigorously defined class of "fair" problems, but it doesn't matter whether Omega can recognise you as a TDT-agent or not.

Comment author: jimrandomh 23 May 2012 02:30:47PM 2 points [-]

I don't think so. If TDT-prime two boxes, the TDT simulation two-boxes, so only one box is full, so TDT-prime walks away with $1000.

No, if TDT-prime two boxes, the TDT simulation still one-boxes.

Comment author: APMason 23 May 2012 02:39:16PM 6 points [-]

Hmm, so TDT-prime would reason something like, "The TDT simulation will one-box because, not knowing that it's the simulation, but also knowing that the simulation will use exactly the same decision theory as itself, it will conclude that the simulation will do the same thing as itself and so one-boxing is the best option. However, I'm different to the TDT-simulation, and therefore I can safely two-box without affecting its decision." In which case, does it matter how inconsequential the difference is? Yep, I'm confused.

Comment author: drnickbone 23 May 2012 03:34:34PM 2 points [-]

I also had thoughts along these lines - variants of TDT could logically separate themselves, so that T-0 one-boxes when it is simulated, but T-1 has proven that T-0 will one-box, and hence T-1 two-boxes when T-0 is the sim.

But a couple of difficulties arise. The first is that if TDT variants can logically separate from each other (i.e. can prove that their decisions aren't linked) then they won't co-operate with each other in Prisoner's Dilemma. We could end up with a bunch of CliqueBots that only co-operate with their exact clones, which is not ideal.

The second difficulty is that for each specific TDT variant, one with algorithm T' say, there will be a specific problematic problem on which T' will do worse than CDT (and indeed worse than all the other variants of TDT) - this is the problem with T' being the exact algorithm running in the sim. So we still don't get the - desirable - property that there is some sensible decision theory called TDT that is optimal across fair problems.

The best suggestion I've heard so far is that we try to adjust the definition of "fairness", so that these problematic problems also count as "unfair". I'm open to proposals on that one...

Comment author: AlexMennen 04 June 2012 11:39:19PM 0 points [-]

But a couple of difficulties arise. The first is that if TDT variants can logically separate from each other (i.e. can prove that their decisions aren't linked) then they won't co-operate with each other in Prisoner's Dilemma. We could end up with a bunch of CliqueBots that only co-operate with their exact clones, which is not ideal.

I think this is avoidable. Let's say that there are two TDT programs called Alice and Bob, which are exactly identical except that Alice's source code contains a comment identifying it as Alice, whereas Bob's source code contains a comment identifying it as Bob. Each of them can read their own source code. Suppose that in problem 1, Omega reveals that the source code it used to run the simulation was Alice. Alice has to one-box. But Bob faces a different situation than Alice does, because he can find a difference between his own source code and the one Omega simulated, whereas Alice could not. So Bob can two-box without effecting what Alice would do.

However, if Alice and Bob play the prisoner's dilemma against each other, the situation is much closer to symmetric. Alice faces a player identical to itself except with the "Alice" comment replaced with "Bob", and Bob faces a player identical to itself except with the "Bob" comment replaced with "Alice". Hopefully, their algorithm would compress this information down to "The other player is identical to me, but has a comment difference in its source code", at which point each player would be in an identical situation.

Comment author: drnickbone 09 June 2012 11:24:08AM 1 point [-]

You might want to look at my follow-up article which discusses a strategy like this (among others). It's worth noting that slight variations of the problem remove the opportunity for such "sneaky" strategies.

Comment author: AlexMennen 09 June 2012 08:46:14PM 0 points [-]

Ah, thanks. I had missed that, somehow.

Comment author: kybernetikos 06 June 2012 12:12:51PM *  0 points [-]

In a prisoners dilemma Alice and Bob affect each others outcomes. In the newcomb problem, Alice affects Bobs outcome, but Bob doesn't affect Alices outcome. That's why it's OK for Bob to consider himself different in the second case as long as he knows he is definitely not Alice (because otherwise he might actually be in a simulation) but not OK for him to consider himself different in the prisoners dilemma.

Comment author: MugaSofer 25 December 2012 04:13:32PM -1 points [-]

However, if Alice and Bob play the prisoner's dilemma against each other, the situation is much closer to symmetric. Alice faces a player identical to itself except with the "Alice" comment replaced with "Bob", and Bob faces a player identical to itself except with the "Bob" comment replaced with "Alice". Hopefully, their algorithm would compress this information down to "The other player is identical to me, but has a comment difference in its source code", at which point each player would be in an identical situation.

Why doesn't that happen when dealing with Omega?

Comment author: AlexMennen 25 December 2012 08:01:22PM 0 points [-]

Because if Omega uses Alice's source code, then Alice sees that the source code of the simulation is exactly the same as hers, whereas Bob sees that there is a comment difference, so the situation is not symmetric.

Comment author: MugaSofer 25 December 2012 10:21:11PM -1 points [-]

So why doesn't that happen in the prisoner's dilemma?

Comment author: AlexMennen 25 December 2012 10:47:57PM 0 points [-]

Because Alice sees that Bob's source code is the same as hers except for a comment difference, and Bob sees that Alice's source code is the same as his except for a comment difference, so the situation is symmetric.

Comment author: APMason 23 May 2012 04:22:55PM *  0 points [-]

Well, I've had a think about it, and I've concluded that it would matter how great the difference between TDT and TDT-prime is. If TDT-prime is almost the same as TDT, but has an extra stage in its algorithm in which it converts all dollar amounts to yen, it should still be able to prove that it is isomorphic to Omega's simulation, and therefore will not be able to take advantage of "logical separation".

But if TDT-prime is different in a way that makes it non-isomorphic, i.e. it sometimes gives a different output given the same inputs, that may still not be enough to "separate" them. If TDT-prime acts the same as TDT, except when there is a walrus in the vicinity, in which case it tries to train the walrus to fight crime, it is still the case in this walrus-free problem that it makes exactly the same choice as the simulation (?). It's as if you need the ability to prove that two agents necessarily give the same output for the particular problem you're faced with, without proving what output those agents actually give, and that sure looks crazy-hard.

EDIT: I mean crazy-hard for the general case, but much, much easier for all the cases where the two agents are actually the same.

EDIT 2: On the subject of fairness, my first thoughts: A fair problem is one in which if you had arrived at your decision by a coin flip (which is as transparently predictable as your actual decision process - i.e. Omega can predict whether it's going to come down heads or tails with perfect accuracy), you would be rewarded or punished no more or less than you would be using your actual decision algorithm (and this applies to every available option).

EDIT 3: Sorry to go on like this, but I've just realised that won't work in situations where some other agent bases their decision on whether you're predicting what their decision will be, i.e. Prisoner's Dilemma.

Comment author: jimrandomh 23 May 2012 08:14:02PM 0 points [-]

The right place to introduce the separation is not in between TDT and TDT-prime, but in between TDT-prime's output and TDT-prime's decision. If its output is a strategy, rather than a number of boxes, then that strategy can include a byte-by-byte comparison; and if TDT and TDT-prime both do it that way, then they both win as much as possible.

Comment author: dlthomas 23 May 2012 08:25:17PM 1 point [-]

But doesn't that make cliquebots, in general?

Comment author: drnickbone 24 May 2012 12:08:43PM 0 points [-]

I'm thinking hard about this one...

Can all the TDT variants adopt a common strategy, but with different execution results, depending on source-code self-inspection and sim-inspection? Can that approach really work in general without creating CliqueBots? Don't know yet without detailed analysis.

Another issue is that Omega is not obliged to reveal the source-code of the sim; it could instead provide some information about the method used to generate / filter the sim code (e.g. a distribution the sim was drawn from) and still lead to a well-defined problem. Each TDT variant would not then know whether it was the sim.

I'm aiming for a follow-up article addressing this strategy (among others).

Comment author: khafra 24 May 2012 05:57:56PM 0 points [-]

Can all the TDT variants adopt a common strategy, but with different execution results, depending on source-code self-inspection and sim-inspection?

This sounds equivalent to asking "can a turing machine generate non-deterministically random numbers?" Unless you're thinking about coding TDT agents one at a time and setting some constant differently in each one.

Comment author: MugaSofer 25 December 2012 04:07:16PM *  -1 points [-]

Yep, I'm confused.

Sounds like you have it exactly right.