A problem with Timeless Decision Theory (TDT)

Gary_Drescher

According to Ingredients of Timeless Decision Theory, when you set up a factored causal graph for TDT, "You treat your choice as determining the result of the logical computation, and hence all instantiations of that computation, and all instantiations of other computations dependent on that logical computation", where "the logical computation" refers to the TDT-prescribed argmax computation (call it C) that takes all your observations of the world (from which you can construct the factored causal graph) as input, and outputs an action in the present situation.

I asked Eliezer to clarify what it means for another logical computation D to be either the same as C, or "dependent on" C, for purposes of the TDT algorithm. Eliezer answered:

For D to depend on C means that if C has various logical outputs, we can infer new logical facts about D's logical output in at least some cases, relative to our current state of non-omniscient logical knowledge. A nice form of this is when supposing that C has a given exact logical output (not yet known to be impossible) enables us to infer D's exact logical output, and this is true for every possible logical output of C. Non-nice forms would be harder to handle in the decision theory but we might perhaps fall back on probability distributions over D.

I replied as follows (which Eliezer suggested I post here).

If that's what TDT means by the logical dependency between Platonic computations, then TDT may have a serious flaw.

Consider the following version of the transparent-boxes scenario. The predictor has an infallible simulator D that predicts whether I one-box here [EDIT: if I see $1M]. The predictor also has a module E that computes whether the ith digit of pi is zero, for some ridiculously large value of i that the predictor randomly selects. I'll be told the value of i, but the best I can do is assign an a priori probability of .1 that the specified digit is zero.

The predictor puts $1M in the large box iff (D xor E) is true. (And that's explained to me, of course.)

So let's say I'm confronted with this scenario, and I see $1M in the large box.

The flaw then is that E (as well as D) meets your criterion for "depending on" my decision computation C. I'm initially unsure what C and E output. But if C in fact one-boxes here, then I can infer that E outputs False (or else the large box has to be empty, which it isn't). Similarly, if C in fact two-boxes here, then I can infer that E outputs True. (Or equivalently, a third-party observer could soundly draw either of those inferences.)

So E does indeed "depend on" C, in the particular sense you've specified. Thus, if I happen to have a strong enough preference that E output True, then TDT (as currently formulated) will tell me to two-box for the sake of that goal. But that's the wrong decision, of course. In reality, I have no choice about the specified digit of pi.

What's going on, it seems to me, is that the kind of logical/Platonic "dependency" that TDT would need to invoke here is this: that E's output be counterfactually entailed by C's output (which it isn't, in this case [see footnote]), rather than (as you've specified) merely inferable from C's output (which indeed it is, in this case). That's bad news, because distinguishing what my action does or does not counterfactually entail (as opposed to what it implies, causes, gives evidence for, etc.) is the original full-blown problem that TDT's prescribed decision-computation is meant to solve. So it may turn out that in order to proceed with that very computation (specifically, in order to ascertain which other Platonic computations "depend on" the decision computation C), you already need to (somehow) know the answer that the computation is trying to provide.

--Gary

[footnote] Because if-counterfactually C were to two-box, then (contrary to fact) the large box would (probably) be empty, circumventing the inference about E.

[appendix] In this post, you write:

...reasoning under logical uncertainty using limited computing power... is another huge unsolved open problem of AI. Human mathematicians had this whole elaborate way of believing that the Taniyama Conjecture implied Fermat's Last Theorem at a time when they didn't know whether the Taniyama Conjecture was true or false; and we seem to treat this sort of implication in a rather different way than '2=1 implies FLT', even though the material implication is equally valid.

I don't follow that. The sense of implication in which mathematicians established that TC implies FLT (before knowing if TC was true) is precisely material/logical implication: they showed ~(TC & ~FLT). And similarly, we can prove ~(3SAT-in-P & ~(P=NP)), etc. There's no need here to construct (or magically conjure) a whole alternative inference system for reasoning under logical uncertainty.

So if the inference you speak of (when specifying what it means for D to "depend on" C) is the same kind as was used in establishing TC=>FLT, then it's just material implication, which (as argued above) leads TDT to give wrong answers. Or if we substitute counterfactual entailment for material implication, then TDT becomes circular (question-begging). Or if you have in mind some third alternative, I'm afraid I don't understand what it might be.

EDIT: The rules of the original transparent-boxes problem (as specified in Good and Real) are: the predictor conducts a simulation that tentatively presumes there will be $1M in the large box, and then puts $1M in the box (for real) iff that simulation showed one-boxing. Thus, if the large box turns out to be empty, there is no requirement for that to be predictive of the agent's choice under those circumstances. The present variant is the same, except that (D xor E) determines the $1M, instead of just D. (Sorry, I should have said this to begin with, instead of assuming it as background knowledge.)

I asked Eliezer to clarify what it means for another logical computation D to be either the same as C, or "dependent on" C, for purposes of the TDT algorithm. Eliezer answered:

For D to depend on C means that if C has various logical outputs, we can infer new logical facts about D's logical output in at least some cases, relative to our current state of non-omniscient logical knowledge. A nice form of this is when supposing that C has a given exact logical output (not yet known to be impossible) enables us to infer D's exact logical output, and this is true for every possible logical output of C. Non-nice forms would be harder to handle in the decision theory but we might perhaps fall back on probability distributions over D.

I replied as follows (which Eliezer suggested I post here).

If that's what TDT means by the logical dependency between Platonic computations, then TDT may have a serious flaw.

The predictor puts $1M in the large box iff (D xor E) is true. (And that's explained to me, of course.)

So let's say I'm confronted with this scenario, and I see $1M in the large box.

--Gary

[footnote] Because if-counterfactually C were to two-box, then (contrary to fact) the large box would (probably) be empty, circumventing the inference about E.

[appendix] In this post, you write:

...reasoning under logical uncertainty using limited computing power... is another huge unsolved open problem of AI. Human mathematicians had this whole elaborate way of believing that the Taniyama Conjecture implied Fermat's Last Theorem at a time when they didn't know whether the Taniyama Conjecture was true or false; and we seem to treat this sort of implication in a rather different way than '2=1 implies FLT', even though the material implication is equally valid.

Logical uncertainty has always been more difficult to deal with than physical uncertainty; the problem with logical uncertainty is that if you analyze it enough, it goes away. I've never seen any really good treatment of logical uncertainty.

But if we depart from TDT for a moment, then it does seem clear that we need to have causelike nodes corresponding to logical uncertainty in a DAG which describes our probability distribution. There is no other way you can completely observe the state of a calculator sent to Mars and a calculator sent to Venus, and yet remain uncertain of their outcomes yet believe the outcomes are correlated. And if you talk about error-prone calculators, two of which say 17 and one of which says 18, and you deduce that the "Platonic answer" was probably in fact 17, you can see that logical uncertainty behaves in an even more causelike way than this.

So, going back to TDT, my hope is that there's a neat set of rules for factoring our logical uncertainty in our causal beliefs, and that these same rules also resolve the sort of situation that you describe.

If you consider the notion of the correlated error-prone calculators, two returning 17 and one returning 18, then the most intuitive way to handle this would be to see a "Platonic answer" as its own causal node, and the calculators as error-prone descendants. I'm pretty sure this is how my brain is drawing the graph, but I'm not sure it's the correct answer; it seems to me that a more principled answer would involve uncertainty about which mathematical fact affects each calculator - physically uncertain gates which determine which calculation affects each calculator.

For the (D xor E) problem, we know the behavior we want the TDT calculation to exhibit; we want (D xor E) to be a descendant node of D and E. If we view the physical observation of $1m as telling us the raw mathematical fact (D xor E), and then perform mathematical inference on D, we'll find that we can affect E, which is not what we want. Conversely if we view D as having a physical effect, and E as having a physical effect, and the node D xor E as a physical descendant of D and E, we'll get the behavior we want. So the question is whether there's any principled way of setting this up which will yield the second behavior rather than the first, and also, presumably, yield epistemically correct behavior when reasoning about calculators and so on.

That's if we go down avenue (2). If we go down avenue (1), then we give primacy to our intuition that if-counterfactually you make a different decision, this logically controls the mathematical fact (D xor E) with E held constant, but does not logically control E with (D xor E) held constant. While this does sound intuitive in a sense, it isn't quite nailed down - after all, D is ultimately just as constant as E and (D xor E), and to change any of them makes the model equally inconsistent.

These sorts of issues are something I'm still thinking through, as I think I've mentioned, so let me think out loud for a bit.

In order to observe anything that you think has already been controlled by your decision - any physical thing in which a copy of D has already played a role - then (leaving aside the question of Omega's strategy that simulated alternate versions of you to select a self-consistent problem, and whether this introduces conditional-strategy-dependence rather than just decision-dependence into the problem) there have to be other physical facts which combine with D to yield our observation.

Some of these physical facts may themselves be affected by mathematical facts, like an implemented computation of E; but the point is that in order to have observed anything controlled by D, we already had to draw a physical, causal diagram in which other nodes descended from D.

So suppose we introduce the rule that in every case like this, we will have some physical node that is affected by D, and if we can observe info that depends on D in any way, we'll view the other mathematical facts as combining with D's physical node. This is a rule that tells us not to draw the diagram with a physical node being determined by the mathematical fact D xor E, but rather to have a physical node determined by D, and then a physical descendent D xor E. (Which in this particular problem should descend from a physical node E that descends from the mathematical fact E, because the mathematical fact E is correlated with our uncertainty about other things, and a factored causal graph should have no remaining correlated sources of background uncertainty; but if E didn't correlate to anything else in particular, we could just have D descending to (D xor E) via the (xor with E) rule.)

When I evaluate this proposed solution for ad-hoc-ness, it does admittedly look a bit ad-hoc, but it solves at least one other problem than the one I started with, and which I didn't think of until now. Suppose Omega tells me that I make the same decision in the Prisoner's Dilemma as Agent X. This does not necessarily imply that I should cooperate with Agent X. X and I could have made the same decision for different (uncorrelated) reasons, and Omega could have simply found out by simulating the two of us that X and I gave the same response. In this case, presumably defecting; but if I cooperated, X wouldn't do anything differently. X is just a piece of paper with "Defect" written on it.

If I draw a causal diagram of how I came to learn this correlation from Omega, and I follow the rule of always drawing a causal boundary around the mathematical fact D as soon as it physically affects something, then, given the way Omega simulated both of us to observe the correlation, I see that D and X separately physically affected the correlation-checker node.

On the other hand, if I can analyze the two pieces of code D and X and see that they return the same output, without yet knowing the output, then this knowledge was obtained in a way that doesn't involve D producing an output, so I don't have to draw a hard causal boundary around that output.

If this works, the underlying principle that makes it work is something along the lines of "for D to control X, the correlation between our uncertainty about D and X has to emerge in a way that doesn't involve anyone already computing D". Otherwise D has no free will (said firmly tongue-in-cheek). I am not sure that this principle has any more elegant expression than the rule, "whenever, in your physical model of the universe, D finishes computing, draw a physical/causal boundary around that finished computation and have other things physically/causally descend from it".

If this principle is violated then D ends up "correlated" to all sorts of other things we observe, like the price of fish and whether it's raining outside, via the magic of xor.

I'm rereading past discussions to find insights. This jumped out at me:

Suppose Omega tells me that I make the same decision in the Prisoner's Dilemma as Agent X. This does not necessarily imply that I should cooperate with Agent X.

Do you still believe this?

0lessdazed15y

If X isn't like us, we can't "control" X by making a decision similar to what we would want X to output*. We shouldn't go from being an agent that defects in the prisoner's dilemma with Agent X when told we "make the same decision in the Prisoner's Dilemma as Agent X" to being one that does not defect, just as we do not unilaterally switch from natural to precision bidding when in contract bridge a partner opens with two clubs (which signals a good hand under precision bidding, and not under natural bidding). However, there do exist agents who should cooperate every time they hear they "make the same decision in the Prisoner's Dilemma as Agent X", those who have committed to cooperate in such cases. In some such cases, they are up against pieces of paper on which "cooperate" is written (too bad they didn't have a more discriminating algorithm/clear Omega), in others, they are up against copies of themselves or other agents whose output depends on what Omega tells them. In any case, many agents should cooperate when they hear that. Yes? No? Why shouldn't one be such an agent? Do we know ahead of time that we are likely to be up against pieces of paper with "cooperate" on them, and Omega would tell unhelpfully tell us we "make the same decision in the Prisoner's Dilemma as Agent X" in all such cases, though if we had a different strategy we could have gotten useful information and defected in that case? *Other cases include us defecting to get X to cooperate, and others where X's play depends on ours, but this is the natural case to use when considering if the Agent X's action depends on ours, a not strategically incompetent Agent X that has a strategy at least as good as always defecting or cooperating and does not try to condition his cooperating on our defecting or the like.

3Gary_Drescher16y

I agree this sounds intuitive. As I mentioned earlier, though, nailing this down is tantamount to circling back and solving the full-blown problem of (decision-supporting) counterfactual reasoning: the problem of how to distinguish which facts to “hold fixed”, and which to “let vary” for consistency with a counterfactual antecedent. In any event, is the idea to try to build a separate graph for math facts, and use that to analyze “logical dependency” among the Platonic nodes in the original graph, in order to carry out TDT's modified “surgical alteration” of the original graph? Or would you try to build one big graph that encompasses physical and logical facts alike, and then use Pearl's decision procedure without further modification? [...] Wait, isn't it decision-computation C—rather than simulation D—whose “effect” (in the sense of logical consequence) on E we're concerned about here? It's the logical dependents of C that get surgically altered in the graph when C gets surgically altered, right? (I know C and D are logically equivalent, but you're talking about inserting a physical node after D, not C, so I'm a bit confused.) I'm having trouble following the gist of avenue (2) at the moment. Even with the node structure you suggest, we can still infer E from C and from the physical node that matches (D xor E)—unless the new rule prohibits relying on that physical node, which I guess is the idea. But what exactly is the prohibition? Are we forbidden to infer any mathematical fact from any physical indicator of that fact? Or is there something in particular about node (D xor E) that makes it forbidden? (It would be circular to cite the node's dependence on C in the very sense of "dependence" that the new rule is helping us to compute.)

48

A problem with Timeless Decision Theory (TDT)

48

48

48

A problem with Timeless Decision Theory (TDT)

48

48