Or does my example fall outside of the specified problem class?
If I wanted to defend the original thesis, I would say yes, because TDT doesn't cooperate or defect depending directly on your decision, but cooperates or defects depending on how your decision depends on its decision (which was one of the open problems I listed - the original TDT is for cases where Omega offers you straightforward dilemmas in which its behavior is just a direct transform of your behavior). So where one algorithm has one payoff matrix for defection or cooperation, the other algorithm gets a different payoff matrix for defection or cooperation, which breaks the "problem class" under which the original TDT is automatically reflectively consistent.
Nonetheless it's certainly an interesting dilemma.
Your comment here is actually pre-empting a comment that I'd planned to make after providing some of the background for the content of TDT. I'd thought about your dilemmas, and then did manage to translate into my terms a notion about how it might be possible to unilaterally defect in the Prisoner's Dilemma and predictably get away with it, provided you did so for unusual reasons. But the conditions on "unusual reasons" are much more difficult than your posts seem to imply. We can't all act on unusual reasons and end up doing the same thing, after all. How is it that these two TDT AIs got here, if not by act of Omega, if the sensible thing to do is always to submit a CDT AI?
To introduce yet another complication: What if the TDTs that you're playing against, decide to defect unconditionally if you submit a CDT player, in order to give you an incentive to submit a TDT player? Given that your reason for submitting a CDT player involves your expectation about how the TDT players will respond, and that you can "get away with it"? It's the TDT's responses that make them "exploitable" by your decision to submit a CDT player - so what if they employ a different strategy instead? (This is another open problem - "who acts first" in timeless negotiations.)
There might be a certain sense in which being in a "small subgroup internally correlated but not correlated with larger groups" could possibly act as a sort of resource for getting away with defection in the true PD, because if you're in a large group then defecting shifts the probability of an opponent likewise defecting by a lot, but if you're in a small subgroup then it shifts the probability of the opponent defecting by a little, so there's a lower penalty for defection, so in marginal cases a small subgroup might play defection while a large subgroup plays cooperate. (But again, the conditions on this are difficult. If all small subgroups reason this way, then all small subgroups form a large correlated group!)
Anyway - you can't end up in a small subgroup if you start out in a large one, because if you decide to deliberately condition on noise in order to decrease the size of your subgroup, that itself is a correlated sort of decision with a clear line of reasoning and motive, and others in your correlated group will try doing the same thing, with predictable results. So to the extent that lots of AI designers in distant parts of Reality are discussing this same issue with the same logic, we are already in a group of a certain minimum size.
But this does lead to an argument for CEV (values extrapolating / Friendly AI) algorithms that don't automatically, inherently correlate us with larger groups than we already started out being in. If uncorrelation is a nonrenewable resource then FAI programmers should at least be careful not to wantonly burn it. You can't deliberately add noise, but you might be able to preserve existing uncorrelation.
Also, other TDTs can potentially set their "minimum cooperator frequency threshold" at just the right level that if any group of noticeable size chooses to defect, all the TDTs start defecting - though this itself is a possibility I am highly unsure of, and once again it has to do with "who goes first" in timeless strategies, which is an open problem.
But these are issues in which my understanding is still shaky, and it very rapidly gets us into very dangerous territory like trying to throw the steering wheel out the window while playing chicken.
So far as evolved biological organisms go, I suspect that the ones who create successful Friendly AIs (instead of losing control and dying at the hands of paperclip maximizers), would hardly start out seeing only the view from CDT - most of them/us would be making the decision "Should I build TDT, knowing that the decisions of other biological civilizations are correlated to this one?" and not "Should I build TDT, having never thought of that?" In other words, we may already be part of a large correlated subgroup - though I sometimes suspect that most of the AIs out there are paperclip maximizers born of experimental accidents, and in that case, if there is no way of verifying source code, nor of telling the difference between SIs containing bio-values-preserving civs and SIs containing paperclip maximizers, then we might be able to exploit the relative smallness of the "successful biological designer" group...
...but a lot of this presently has the quality of "No fucking way would I try that in real life", at least based on my current understanding. The closest I would get might be trying for a CEV algorithm that did not inherently add correlation to decision systems with which we were not already correlated.
What if the TDTs that you're playing against, decide to defect unconditionally if you submit a CDT player, in order to give you an incentive to submit a TDT player?
That's a good point, but what if the process that gives birth to CDT doesn't listen to the incentives you give it? For example, it could be evolution or random chance.
Here's an example, similar to Wei's example above. Imagine two parallel universes, both containing large populations of TDT agents. In both universes, a child is born, looking exactly like everyone else. The child in universe A ...
Followup to: Newcomb's Problem and Regret of Rationality, Towards a New Decision Theory
Wei Dai asked:
...
All right, fine, here's a fast summary of the most important ingredients that go into my "timeless decision theory". This isn't so much an explanation of TDT, as a list of starting ideas that you could use to recreate TDT given sufficient background knowledge. It seems to me that this sort of thing really takes a mini-book, but perhaps I shall be proven wrong.
The one-sentence version is: Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.
The three-sentence version is: Factor your uncertainty over (impossible) possible worlds into a causal graph that includes nodes corresponding to the unknown outputs of known computations; condition on the known initial conditions of your decision computation to screen off factors influencing the decision-setup; compute the counterfactuals in your expected utility formula by surgery on the node representing the logical output of that computation.
To obtain the background knowledge if you don't already have it, the two main things you'd need to study are the classical debates over Newcomblike problems, and the Judea Pearl synthesis of causality. Canonical sources would be "Paradoxes of Rationality and Cooperation" for Newcomblike problems and "Causality" for causality.
For those of you who don't condescend to buy physical books, Marion Ledwig's thesis on Newcomb's Problem is a good summary of the existing attempts at decision theories, evidential decision theory and causal decision theory. You need to know that causal decision theories two-box on Newcomb's Problem (which loses) and that evidential decision theories refrain from smoking on the smoking lesion problem (which is even crazier). You need to know that the expected utility formula is actually over a counterfactual on our actions, rather than an ordinary probability update on our actions.
I'm not sure what you'd use for online reading on causality. Mainly you need to know:
It will be helpful to have the standard Less Wrong background of defining rationality in terms of processes that systematically discover truths or achieve preferred outcomes, rather than processes that sound reasonable; understanding that you are embedded within physics; understanding that your philosophical intutions are how some particular cognitive algorithm feels from inside; and so on.
The first lemma is that a factorized probability distribution which includes logical uncertainty - uncertainty about the unknown output of known computations - appears to need cause-like nodes corresponding to this uncertainty.
Suppose I have a calculator on Mars and a calculator on Venus. Both calculators are set to compute 123 * 456. Since you know their exact initial conditions - perhaps even their exact initial physical state - a standard reading of the causal graph would insist that any uncertainties we have about the output of the two calculators, should be uncorrelated. (By standard D-separation; if you have observed all the ancestors of two nodes, but have not observed any common descendants, the two nodes should be independent.) However, if I tell you that the calculator at Mars flashes "56,088" on its LED display screen, you will conclude that the Venus calculator's display is also flashing "56,088". (And you will conclude this before any ray of light could communicate between the two events, too.)
If I was giving a long exposition I would go on about how if you have two envelopes originating on Earth and one goes to Mars and one goes to Venus, your conclusion about the one on Venus from observing the one on Mars does not of course indicate a faster-than-light physical event, but standard ideas about D-separation indicate that completely observing the initial state of the calculators ought to screen off any remaining uncertainty we have about their causal descendants so that the descendant nodes are uncorrelated, and the fact that they're still correlated indicates that there is a common unobserved factor, and this is our logical uncertainty about the result of the abstract computation. I would also talk for a bit about how if there's a small random factor in the transistors, and we saw three calculators, and two showed 56,088 and one showed 56,086, we would probably treat these as likelihood messages going up from nodes descending from the "Platonic" node standing for the ideal result of the computation - in short, it looks like our uncertainty about the unknown logical results of known computations, really does behave like a standard causal node from which the physical results descend as child nodes.
But this is a short exposition, so you can fill in that sort of thing yourself, if you like.
Having realized that our causal graphs contain nodes corresponding to logical uncertainties / the ideal result of Platonic computations, we next construe the counterfactuals of our expected utility formula to be counterfactuals over the logical result of the abstract computation corresponding to the expected utility calculation, rather than counterfactuals over any particular physical node.
You treat your choice as determining the result of the logical computation, and hence all instantiations of that computation, and all instantiations of other computations dependent on that logical computation.
Formally you'd use a Godelian diagonal to write:
Argmax[A in Actions] in Sum[O in Outcomes](Utility(O)*P(this computation yields A []-> O|rest of universe))
(where P( X=x []-> Y | Z ) means computing the counterfactual on the factored causal graph P, that surgically setting node X to x, leads to Y, given Z)
Setting this up correctly (in accordance with standard constraints on causal graphs, like noncircularity) will solve (yield reflectively consistent, epistemically intuitive, systematically winning answers to) 95% of the Newcomblike problems in the literature I've seen, including Newcomb's Problem and other problems causing CDT to lose, the Smoking Lesion and other problems causing EDT to fail, Parfit's Hitchhiker which causes both CDT and EDT to lose, etc.
Note that this does not solve the remaining open problems in TDT (though Nesov and Dai may have solved one such problem with their updateless decision theory). Also, although this theory goes into much more detail about how to compute its counterfactuals than classical CDT, there are still some visible incompletenesses when it comes to generating causal graphs that include the uncertain results of computations, computations dependent on other computations, computations uncertainly correlated to other computations, computations that reason abstractly about other computations without simulating them exactly, and so on. On the other hand, CDT just has the entire counterfactual distribution rain down on the theory as mana from heaven (e.g. James Joyce, Foundations of Causal Decision Theory), so TDT is at least an improvement; and standard classical logic and standard causal graphs offer quite a lot of pre-existing structure here. (In general, understanding the causal structure of reality is an AI-complete problem, and so in philosophical dilemmas the causal structure of the problem is implicitly given in the story description.)
Among the many other things I am skipping over:
Those of you who've read the quantum mechanics sequence can extrapolate from past experience that I'm not bluffing. But it's not clear to me that writing this book would be my best possible expenditure of the required time.