Thanks. A few points, mostly for clarification.
Thanks for your reply. And I apologize: I should have looked whether you have an account on LessWrong and tag you in the post.
The fundamental problem with your arguments is that the scenarios in which you're imagining FDT agents "lose" are logically impossible. You're not seeing the broader perspective that the FDT agents' non-negotiation with terrorists policy prevents them from being blackmailed in the first place.
Surprisingly, Schwarz doesn't analyze CDT's and FDT's answer to Prisoner's Dilemma with a Twin (besides just giving the answers). It's worth noting FDT clearly does better than CDT here, because the FDT agent (and its twin) both get away with 1 year in prison while the CDT agent and its twin both get 5. This is because the agents and their twins are clones - and therefore have the same decision theory and thus reach the same conclusion to this problem. FDT recognizes this, but CDT doesn't. I am baffled Schwarz calls FDT's recommendation on this problem "insane", as it's easily the right answer.
I personally agree that cooperating in the Twin PD is the correct choice, but I don't think it is meaningful to argue for this on the grounds of decision-theoretic performance (as you seem to do). From The lack of performance metrics for CDT versus EDT, etc. by Caspar Oesterheld:
[T]here is no agreed-upon metric to compare decision theories, no way to asses even for a particular problem whether one decision theory (or its recommendation) does better than another. (This is why the CDT-versus-EDT-versus-other debate is at least partly a philosophical one.) In fact, it seems plausible that finding such a metric is “decision theory-complete” (to butcher another term with a specific meaning in computer science). By that I mean that settling on a metric is probably just as hard as settling on a decision theory and that mapping between plausible metrics and plausible decision theories is fairly easy.
Indeed, Schwarz makes a similar point in the post you are responding to:
Yudkowsky and Soares constantly talk about how FDT "outperforms" CDT, how FDT agents "achieve more utility", how they "win", etc. As we saw above, it is not at all obvious that this is true. It depends, in part, on how performance is measured. At one place, Yudkowsky and Soares are more specific. Here they say that "in all dilemmas where the agent's beliefs are accurate [??] and the outcome depends only on the agent's actual and counterfactual behavior in the dilemma at hand – reasonable constraints on what we should consider "fair" dilemmas – FDT performs at least as well as CDT and EDT (and often better)". OK. But how we should we understand "depends on … the dilemma at hand"? First, are we talking about subjunctive or evidential dependence? If we're talking about evidential dependence, EDT will often outperform FDT. And EDTers will say that's the right standard. CDTers will agree with FDTers that subjunctive dependence is relevant, but they'll insist that the standard Newcomb Problem isn't "fair" because here the outcome (of both one-boxing and two-boxing) depends not only on the agent's behavior in the present dilemma, but also on what's in the opaque box, which is entirely outside her control. Similarly for all the other cases where FDT supposedly outperforms CDT. Now, I can vaguely see a reading of "depends on … the dilemma at hand" on which FDT agents really do achieve higher long-run utility than CDT/EDT agents in many "fair" problems (although not in all). But this is a very special and peculiar reading, tailored to FDT. We don't have any independent, non-question-begging criterion by which FDT always "outperforms" EDT and CDT across "fair" decision problems.
Thanks for responding!
I personally agree that cooperating in the Twin PD is clearly the correct choice, but I don't think it is meaningful to argue for this on the grounds of decision-theoretic performance (as you seem to do). From The lack of performance metrics for CDT versus EDT, etc. by Caspar Oesterheld:
I disagree. There's a clear measure of performance given in the Twin PD: the utilities.
I disagree with Oesterheld's point about CDT vs EDT and metrics; I think we know enough math to say EDT is simply a wrong decision theory. We could, in principle, even demonstrate this in real life, by having e.g. 1000 people play a version of XOR Blackmail (500 people with and 500 people without a "termite infestation") and see which theory performs best. It'll be trivial to see EDT makes the wrong decision.
Newcomb's Problem with Transparent Boxes. A demon invites people to an experiment.
Every time I see this I think, 'What if you flip a coin?'
The formulation quoted from Schwarz's post unnecessarily implicitly disallows unpredictability. The usual more general formulation of Transparent Newcomb is to say that $1M is in the big box iff Omega succeeds in predicting that you one-box in case the big box is full. So if instead you succeed in confusing Omega, the box will be empty. A situation where Omega can't be confused also makes sense, in which case the two statements of the problem are equivalent.
Every discussion of decision theories that is not just "agents with max EV win", where EV is calculated as a sum of "probability of the outcome times the value of the outcome" ends up fighting the hypothetical, usually by yelling that in zero-probability worlds someone's pet DT does better than the competition. A trivial calculation shows that winning agents do not succumb to blackmail, stay silent in twin PD, one-box in all Newcomb's variants and procreate in the miserable existence case. I don't know if that's what FDT does, but hopefully what a naive max EV calculation suggests.
So I finished reading On Functional Decision Theory by Wolfgang Schwarz. In this critique of FDT, Schwarz makes quite some claims I either find to be unfair criticism of FDT or just plain wrong - and I think it's interesting to discuss them. Let's go through them one by one. (Note that this post will not make much sense if you aren't familiar with FDT, which is why I linked the paper by Yudkowsky and Soares.)
Schwarz first defines three problems:
Blackmail is a bit vaguely defined here, but the question is whether or not Donald should pay if he actually gets blackmailed - given that he prefers paying to blowing Stormy's gaff and of course prefers not being blackmailed above all. Aside from this, I disagree with the definitions of rational and irrational Schwarz uses here, but that's partly the point of this whole discussion.
Schwarz goes on to say Causal Decision Theory (CDT) will pay on Blackmail, confess on Prisoner's Dilemma with a Twin and two-box on Newcomb's Problem with Transparent Boxes. FDT will not pay, remain silent and one-box, respectively. So far we agree.
However, Schwarz also claims "there's an obvious sense in which CDT agents fare better than FDT agents in the cases we've considered". On Blackmail, he says: "You can escape the ruin by paying $1 once to a blackmailer. Of course you should pay!" (Apparently the hush money is $1.) It may seem this way, because given Donald is already blackmailed, paying is better than not paying, and FDT recommends not paying while CDT pays. But it's worth noting that this is totally irrelevant, since FDT agents never end up in this scenario anyway. The problem statement specifies Stormy would know an FDT agent wouldn't pay, so she wouldn't blackmail such an agent. Schwarz acknowledges this point later on, but doesn't seem to realize it completely refutes his earlier point of CDT doing better in "an obvious sense".
Surprisingly, Schwarz doesn't analyze CDT's and FDT's answer to Prisoner's Dilemma with a Twin (besides just giving the answers). It's worth noting FDT clearly does better than CDT here, because the FDT agent (and its twin) both get away with 1 year in prison while the CDT agent and its twin both get 5. This is because the agents and their twins are clones - and therefore have the same decision theory and thus reach the same conclusion to this problem. FDT recognizes this, but CDT doesn't. I am baffled Schwarz calls FDT's recommendation on this problem "insane", as it's easily the right answer.
Newcomb's Problem with Transparent Boxes is interesting. Given the specified scenario, two-boxing outperforms one-boxing, but this is again irrelevant. Two-boxing results in a logically impossible scenario (given perfect prediction), since then Omega would have predicted you two-box and put nothing in the right box. Given less-then-perfect (but still good) prediction, the scenario is still very unlikely: it's one two-boxers almost never end up in. It's the one-boxers who get the million. Schwarz again acknowledges this point - and again he doesn't seem to realize it means CDT doesn't do better in an obvious sense.
Edit: Vladimir Nesov left a comment which made me realize my above analysis of Newcomb's Problem with Transparent Boxes is a reaction to the formulation in Yudkowsky and Soares' paper instead of the formulation by Schwarz. Since Schwarz left his formulation relatively unspecified, I'll leave the above analysis for what it is. However, note that it is assumed the demon filled the left box if and only if she predicted the participant leaves the left box behind upon seeing two full boxes. The question, then, is what to do upon seeing two full boxes.
I agree for a large part. I care about FDT from the perspective of building the right decision theory for an A(S)I, in which case it is about something like scoring the most utility across a lifetime. The part of the quote about FDT agents being worse off if someone directly punishes "agents who use FDT" is moot though. What if someone decides to punish agents for using CDT?
Schwarz continues with an interesting decision problem:
He says:
True, but things are a bit more complicated than this. An FDT agent facing Procreation recognizes the subjunctive dependence her and her father have on FDT, and, realizing she wants to have been born, procreates. A CDT agent with an FDT father doesn't have this subjunctive dependence (and wouldn't use it if she did) and doesn't procreate, gaining more utils than the FDT agent. But note that the FDT agent is facing a different problem than the CDT agent: she faces one where her father has the same decision theory she does. The CDT agent doesn't have this issue. What if we put the FDT agent in a modified Procreation problem, one where her father is a CDT agent? Correctly realizing she can make a decision other than that of her father, she doesn't procreate. Obviously, in this scenario, the CDT agent also doesn't procreate - even though, through subjunctive dependence, her decision is the exact same as her father's. So, here the CDT agent does worse, because her father wouldn't have procreated either and she isn't even born. So, this gives us two scenarios: one where the FDT agent procreates and lives miserably while the CDT agent lives happily, and one where the FDT agent lives happily while the CDT agent doesn't live at all. FDT is, again, the better decision theory.
It seems, then, we can construct a more useful version of Procreation, called Procreation*:
Procreation*. I wonder whether to procreate. I know for sure that doing so would make my life miserable. But I also have reason to believe that my father faced the exact same choice, and I know he followed the same decision theory I do. If my decision theory were to recommend not procreating, there's a significant probability that I wouldn't exist. I value a miserable life to no life at all, but obviously I value a happy life to a miserable one. Should I procreate?
FDT agents procreate and live miserably - CDT agents don't procreate and, well, don't exist since their father didn't procreate either.
"The trick is to tweak the agents' utility function."? No. I mean, sure, Twinky, it's good to care about others. I do, so does almost everybody. But this completely misses the point. In the above problems, the utility function is specified. Tweaking it gives a new problem. If Twinky indeed cares about her clone's prison years as much as she does about her own, then the payoff matrix would become totally different. I realize that's Schwarz's point, because that gives a new dominant option - but it literally doesn't solve the actual problem. You solve a decision problem by taking one of the allowed actions - not by changing the problem itself. Deep Blue didn't define the opening position as a winning position in order to beat Kasparov. All Schwarz does here is defining new problems CDT does solve correctly. That's fine, but it doesn't solve the issue that CDT still fails the original problems.
Of course, me too (a vengeful streak though? That's not caring about others). So would Yudkowsky and Soares. But don't you think a successful agent should have a decision theory that can at least solve the basic cases like Newcomb's Problem, with or without transparent boxes? Also note how Schwarz is making ad hoc adjustments for each problem: Twinky has to care about her clone's prison time, while Donald has to have a sense of pride/vengeful streak.
But if we can set up a scenario that breaks your decision theory even when we do allow modifying utility functions, that points to a serious flaw in your theory. Would you trust to build it into an Artificial Superintelligence?
Schwarz goes on to list a number of points of questions he has/unclarities he found in Yudkowsky and Soares' paper, which I don't find relevant for the purpose of this post. So this is where I conclude my post: FDT is still standing, and not only that: it is better than CDT.