Today I finally came up with a simple example where TDT clearly loses and CDT clearly wins, and as a bonus, proves that TDT isn't reflectively consistent.
Omega comes to you and says
I'm hosting a game with 3 players. Two players are AIs I created running TDT but not capable of self-modification, one being a paperclip maximizer, the other being a staples maximizer. The last player is an AI you will design. When the game starts, my two AIs will first get the source code of your AI (which is only fair since you know the design of my AIs). Then 2 of the 3 players will be chosen randomly to play a one-shot true PD, without knowing who they are facing. What AI do you submit?
Say the payoffs of the PD are
Suppose you submit an AI running CDT. Then, Omega's AIs will reason as follows: "I have 1/2 chance of playing against a TDT, and 1/2 chance of playing against a CDT. If I play C, then my opponent will play C if it's a TDT, and D if it's a CDT, therefore my expected payoff is 5/2+0/2=2.5. If I play D, then my opponent will play D, so my payoff is 1. Therefore I should play C." Your AI then gets a payoff of 6, since it will play D.
Suppose you submit an AI runn...
Or does my example fall outside of the specified problem class?
If I wanted to defend the original thesis, I would say yes, because TDT doesn't cooperate or defect depending directly on your decision, but cooperates or defects depending on how your decision depends on its decision (which was one of the open problems I listed - the original TDT is for cases where Omega offers you straightforward dilemmas in which its behavior is just a direct transform of your behavior). So where one algorithm has one payoff matrix for defection or cooperation, the other algorithm gets a different payoff matrix for defection or cooperation, which breaks the "problem class" under which the original TDT is automatically reflectively consistent.
Nonetheless it's certainly an interesting dilemma.
Your comment here is actually pre-empting a comment that I'd planned to make after providing some of the background for the content of TDT. I'd thought about your dilemmas, and then did manage to translate into my terms a notion about how it might be possible to unilaterally defect in the Prisoner's Dilemma and predictably get away with it, provided you did so for unusual reasons. But the condition...
Moving second is a disadvantage (at least it seems to always work out that way, counterexamples requested if you can find them) and A can always use less computing power. Rational agents should not regret having more computing power (because they can always use less) or more knowledge (because they can always implement the same strategy they would use with less knowledge) - this sort of thing is a sure sign of reflective inconsistency.
To see why moving logically second is a disadvantage, consider that it lets an opponent playing Chicken always toss their steering wheel out the window and get away with it.
That both players desire to move "logically first" argues strongly that neither one will; that the resolution here does not involve any particular fixed global logical order of decisions.
(I should comment in the future about the possibility that bio-values-derived civs, by virtue of having evolved to be crazy, can succeed in moving logically first using crazy reasoning, but that would be a whole 'nother story, and of course also falls into the "Way the fuck too dangerous to try in real life" category relative to my present knowledge.)
With timeless agents, we can't do backwards induction using the physical order of decisions. We need some notion of the logical order of decisions.
BTW, thanks for this compact way of putting it.
This is very cool, and I haven't digested it yet, but I wonder if it might be open to the criticism that you're effectively postulating the favored answer to Newcomb's Problem (and other such scenarios) by postulating that when you surgically alter one of the nodes, you correspondingly alter the nodes for the other instances of the computation. After all, the crux of the counterfactual-reasoning dilemma in Newcomb's Problem (and similarly in the Prisoner's Dilemma) is to jusftify the inference "If I choose both boxes, then (probably) so does the simulation (even if in fact I/it do not)" rather than "If I choose both boxes, then the simulation doesn't necessarily match my choice (even though in fact it does)". It could be objected that your formalism postulates the desired answer rather than giving a basis for deriving it--an objection that becomes more important when we move away from identical or functionally equivalent source code and start to consider approximate similarities. (See my criticism of Leslie (1991)'s proposal that you should make your choice as though you were also choosing on behalf of other agents of similar causal structure. If I'm not mistake...
Does this theory handle Drescher's example of raising my hand because I want the universe a billion years ago to be such that I would raise my hand a billion years hence?
One of the benefits of publishing a complete explanation is that some of the (valid) criticisms of it will lead to a stronger, repaired theory.
I confess that I don't follow your program yet, but the outline is much preferred to vague "I have a secret theory" teasing.
The three-sentence version is: Factor your uncertainty over (impossible) possible worlds into a causal graph that includes nodes corresponding to the unknown outputs of known computations; condition on the known initial conditions of your decision computation to screen off factors influencing the decision-setup; compute the counterfactuals in your expected utility formula by surgery on the node representing the logical output of that computation.
I'm trying to understand the difference between this formulation and mine. Interestingly, Eliezer seems to h...
This feels right to me. I can't implement it, and I'm not sure I could explain what Eli said, but I understand Pearl well enough (at an intuitive level) to say that it feels like the kind of additions Eli is talking about would clarify and reach the results he's talking about.
Read Pearl. It's not mathy, it's mostly words about graph manipulation.
If you're bothered by math, read Pearl anyway. He doesn't use equations or make you transform symbols. If you can think about information flows or reason visually, Pearl's calculus is for you. You'll under...
Rolf Nelson wanted to know what everyday problems evidential decision theory produces. Newcomb's Problem can be mapped onto the Prisoner's Dilemma, but are there similarly common Smoking Lesion like problems?
This is better than nothing, thanks and upvote. Now let's begin translating this stuff. AFAICT, a "decision theory" is supposed to have two parts:
1) A blah blah verbal algorithm for translating real-world problem descriptions into a certain kind of formal structure.
2) A mathematical algorithm that accepts that formal structure and outputs a decision.
I don't fully understand what formal structure you're proposing (a Pearl-style causal graph with additional "logical" arrows? why would this always be acyclic?), and can't understand the algorithm until the structure is clear enough.
Can anyone suggest me good background reading material to understand the technical language/background knowledge of this and, more generally, on decision theory?
I gave one example earlier of TDT agents not playing cooperate in PD against each other. Here's another, perhaps even more puzzling, example.
Consider 3 TDT agents, A, B, and C, playing a game of 3-choose-2 PD. These agents are identical, except that they have different beliefs about how they are logically related to each other. A and B both believe that A and B are 100% logically correlated (in other words, logically equivalent). A and C both believe that A and C are 0% logically correlated. B and C also believe that B and C are 0% logically correlated.
Wha...
I'm not keeping up here - I only peek at this site occasionallly, rather than following it - but this:
"The one-sentence version is: Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation."
... seems rather similar to the dictum that you should choose as if you really might be any of your subjective duplicates, from across all possible worlds. (I suppose there is a difference, in that "subjective duplicate" refers onl...
Upvoted; this is a good summary of the issue, and using the new label TDT is arguably more elegant than having to talk separately about the rationality of cultivating a disposition.
How significant are the open questions? We should not expect correct theory to work in the face of arbitrary acts of Omega. Suppose Omega says "Tomorrow I will examine your source code, and if you don't subscribe to TDT I will give you $1 million, and if you do subscribe to TDT I will make you watch the Alien movie series -- from the third one on". In this scenario it ...
The three sentence version is actually a one sentence version; it's three independent clauses, but semicolons don't separate sentences.
I'm really sorry, I couldn't help myself.
But if you read the other parts of the solution to "free will", and then furthermore explicitly formulate TDT, then this is what utterly, finally, completely, and without even a tiny trace of confusion or dissatisfaction or a sense of lingering questions, kills off entirely the question of "free will".
If this is correct, then it amounts to a profound philosophical and scientific achievement.
In conclusion, rational agents are not incapable of cooperation, rational agents are not constantly fighting their own source code, rational agents do not go around helplessly wishing they were less rational, and finally, rational agents win.
I'm pretty sure Socrates and Aristotle already pointed much of this out in different words. I should make a post about that. Of course, they didn't do the math.
I agree with cousin_it below. It seems like you're missing some math.
But other than that, I don't see what the big deal is. I was expecting something monumental and game-changing, not "Is that it?"
Re: "Some concluding chiding of those philosophers who blithely decided that the "rational" course of action systematically loses"
Some of those philosophers draw a distinction between rational action and the actions of a rational agent - see here:
I conclude that the rational action for a player in the Newcomb Paradox is taking both boxes, but that rational agents will usually take only one box because they have rationally adopted the disposition to do so.
So: these folk had got the right answer, and any debate with them is over terminology.
Followup to: Newcomb's Problem and Regret of Rationality, Towards a New Decision Theory
Wei Dai asked:
...
All right, fine, here's a fast summary of the most important ingredients that go into my "timeless decision theory". This isn't so much an explanation of TDT, as a list of starting ideas that you could use to recreate TDT given sufficient background knowledge. It seems to me that this sort of thing really takes a mini-book, but perhaps I shall be proven wrong.
The one-sentence version is: Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.
The three-sentence version is: Factor your uncertainty over (impossible) possible worlds into a causal graph that includes nodes corresponding to the unknown outputs of known computations; condition on the known initial conditions of your decision computation to screen off factors influencing the decision-setup; compute the counterfactuals in your expected utility formula by surgery on the node representing the logical output of that computation.
To obtain the background knowledge if you don't already have it, the two main things you'd need to study are the classical debates over Newcomblike problems, and the Judea Pearl synthesis of causality. Canonical sources would be "Paradoxes of Rationality and Cooperation" for Newcomblike problems and "Causality" for causality.
For those of you who don't condescend to buy physical books, Marion Ledwig's thesis on Newcomb's Problem is a good summary of the existing attempts at decision theories, evidential decision theory and causal decision theory. You need to know that causal decision theories two-box on Newcomb's Problem (which loses) and that evidential decision theories refrain from smoking on the smoking lesion problem (which is even crazier). You need to know that the expected utility formula is actually over a counterfactual on our actions, rather than an ordinary probability update on our actions.
I'm not sure what you'd use for online reading on causality. Mainly you need to know:
It will be helpful to have the standard Less Wrong background of defining rationality in terms of processes that systematically discover truths or achieve preferred outcomes, rather than processes that sound reasonable; understanding that you are embedded within physics; understanding that your philosophical intutions are how some particular cognitive algorithm feels from inside; and so on.
The first lemma is that a factorized probability distribution which includes logical uncertainty - uncertainty about the unknown output of known computations - appears to need cause-like nodes corresponding to this uncertainty.
Suppose I have a calculator on Mars and a calculator on Venus. Both calculators are set to compute 123 * 456. Since you know their exact initial conditions - perhaps even their exact initial physical state - a standard reading of the causal graph would insist that any uncertainties we have about the output of the two calculators, should be uncorrelated. (By standard D-separation; if you have observed all the ancestors of two nodes, but have not observed any common descendants, the two nodes should be independent.) However, if I tell you that the calculator at Mars flashes "56,088" on its LED display screen, you will conclude that the Venus calculator's display is also flashing "56,088". (And you will conclude this before any ray of light could communicate between the two events, too.)
If I was giving a long exposition I would go on about how if you have two envelopes originating on Earth and one goes to Mars and one goes to Venus, your conclusion about the one on Venus from observing the one on Mars does not of course indicate a faster-than-light physical event, but standard ideas about D-separation indicate that completely observing the initial state of the calculators ought to screen off any remaining uncertainty we have about their causal descendants so that the descendant nodes are uncorrelated, and the fact that they're still correlated indicates that there is a common unobserved factor, and this is our logical uncertainty about the result of the abstract computation. I would also talk for a bit about how if there's a small random factor in the transistors, and we saw three calculators, and two showed 56,088 and one showed 56,086, we would probably treat these as likelihood messages going up from nodes descending from the "Platonic" node standing for the ideal result of the computation - in short, it looks like our uncertainty about the unknown logical results of known computations, really does behave like a standard causal node from which the physical results descend as child nodes.
But this is a short exposition, so you can fill in that sort of thing yourself, if you like.
Having realized that our causal graphs contain nodes corresponding to logical uncertainties / the ideal result of Platonic computations, we next construe the counterfactuals of our expected utility formula to be counterfactuals over the logical result of the abstract computation corresponding to the expected utility calculation, rather than counterfactuals over any particular physical node.
You treat your choice as determining the result of the logical computation, and hence all instantiations of that computation, and all instantiations of other computations dependent on that logical computation.
Formally you'd use a Godelian diagonal to write:
Argmax[A in Actions] in Sum[O in Outcomes](Utility(O)*P(this computation yields A []-> O|rest of universe))
(where P( X=x []-> Y | Z ) means computing the counterfactual on the factored causal graph P, that surgically setting node X to x, leads to Y, given Z)
Setting this up correctly (in accordance with standard constraints on causal graphs, like noncircularity) will solve (yield reflectively consistent, epistemically intuitive, systematically winning answers to) 95% of the Newcomblike problems in the literature I've seen, including Newcomb's Problem and other problems causing CDT to lose, the Smoking Lesion and other problems causing EDT to fail, Parfit's Hitchhiker which causes both CDT and EDT to lose, etc.
Note that this does not solve the remaining open problems in TDT (though Nesov and Dai may have solved one such problem with their updateless decision theory). Also, although this theory goes into much more detail about how to compute its counterfactuals than classical CDT, there are still some visible incompletenesses when it comes to generating causal graphs that include the uncertain results of computations, computations dependent on other computations, computations uncertainly correlated to other computations, computations that reason abstractly about other computations without simulating them exactly, and so on. On the other hand, CDT just has the entire counterfactual distribution rain down on the theory as mana from heaven (e.g. James Joyce, Foundations of Causal Decision Theory), so TDT is at least an improvement; and standard classical logic and standard causal graphs offer quite a lot of pre-existing structure here. (In general, understanding the causal structure of reality is an AI-complete problem, and so in philosophical dilemmas the causal structure of the problem is implicitly given in the story description.)
Among the many other things I am skipping over:
Those of you who've read the quantum mechanics sequence can extrapolate from past experience that I'm not bluffing. But it's not clear to me that writing this book would be my best possible expenditure of the required time.