dankane comments on Causal decision theory is unsatisfactory - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (158)
Actually, here's a better counter-example, one that actually exemplifies some of the claims of CDT optimality. Suppose that the universe consists of a bunch of agents (who do not know each others' identities) playing one-off PDs against each other. Now 99% of these agents are UDT agents and 1% are CDT agents.
The CDT agents defect for the standard reason. The UDT agents reason that my opponent will do the same thing that I do with 99% probability, therefore, I should cooperate.
CDT agents get 99% DC and 1% DD. UDT agents get 99% CC and 1% CD. The CDT agents in this universe do better than the UDT agents, yet they are facing a perfectly symmetrical scenario with no mind reading involved.
That shouldn't be surprising. The CDT agents here are equivalent to DefectBot, and if they come into existence spontaneously, are no different than natural phenomena like rocks. Notice that the UDT agents in this situation do better than the alternative (if they defected, they would get 100% DD which is a way worse result). They don't care that some DefectBots get to freeload.
Of course, if the defectbots are here because someone calculated that UDT agents would cooperate and therefore being defectbot is a good way to get free utilons... then the UDT agents are incentivized to defect, because this is now an ultimatum game.
And in the variant where bots do know each other's identities, the UDT bots all get 99% CC / 1% DD and the CDT bots suck it.
And the UDT agents are equivalent to CooperateBot. What's your point?
The CDT agents here win because they do not believe that altering their strategy will change the way that their opponents behave. This is actually true in this case, and even true for the UDT agents depending on how you choose to construct your counterfactuals. If a UDT agent suffered a malfunction and defected, it too would do better. In any case, the theorem that UDT agents perform optimally in universes that can only read your mind by knowing what you would do in hypothetical situations is false as this example shows.
UDT bots win in some scenarios where the initial conditions of the scenario favor agents that behave sub-optimally in certain scenarios (and by sub-optimally, I mean where counterfactuals are constructed in the way implicit to CDT). The example above shows that sometimes they are punished for acting suboptimally.
Or how about this example, that simplifies things even further. The game is PD against CooperateBot, BUT before the game starts Omega announces "your opponent will make the same decision that UDT would if I told them this." This announcement causes UDT to cooperate against CooperateBot. CDT on the other hand, correctly deduces that the opponent will cooperate no matter what it does (actually UDT comes to this conclusion too) and therefore decides to defect.
No. There is no obligation to do something just because Omega claims that you will.
First, if I know that my opponent is CooperateBot, then:-
UDT cooperates or defects depending on the contents of the alternative branch. If the alternative branch is unknown then it must guess, and most likely cooperates to be on the safe side.
Now, the problem is different if a CDT agent is put in my place, because that CDT agent does not control (or only weakly controls) the action of the UDT simulation that Omega ran in order to make the assertion about UDT's decision.
Fine. Your opponent actually simulates what UDT would do if Omega had told it that and returns the appropriate response (i.e. it is CooperateBot, although perhaps your finite prover is unable to verify that).
Err, that's not CooperateBot, that's UDT. Yes, UDT cooperates with itself. That's the point. (Notice that if UDT defects here, the outcome is DD.)
It's not UDT. It's the strategy that against any opponent does what UDT would do against it. In particular, it cooperates against any opponent. Therefore it is CooperateBot. It is just coded in a funny way.
To be clear letting Y(X) be what Y does against X we have that BOT(X) = UDT(BOT) = C This is different from UDT. UDT(X) is D for some values of X. The two functions agree when X=UDT and in relatively few other cases.
What is your point, exactly?
It's clear that UDT can't do better vs "BOT" than by cooperating, because if UDT defects against BOT then BOT defects against UDT. Given that dependency, you clearly can't call it CooperateBot, and it's clear that UDT makes the right decision by cooperating with it because CC is better than DD.
OK. Let me say this another way that involves more equations.
So let's let U(X,Y) be the utility that X gets when it plays prisoner's dilemma against Y. For a program X, let BOT^X be the program where BOT^X(Y) = X(BOT^X). Notice that BOT^X(Y) does not depend on Y. Therefore, depending upon what X is BOT^X is either equivalent CooperateBot or equivalent to DefectBot.
Now, you are claiming that UDT plays optimally against BOT_UDT because for any strategy X U(X, BOT^X) <= U(UDT, BOT^UDT) This is true, because X(BOT^X) = BOT^X(X) by the definition of BOT^X. Therefore you cannot do better than CC. On the other hand, it is also true that for any X and any Y that U(X,BOT^Y) <= U(CDT, BOT^Y) This is because BOT^Y's behavior does not depend on X, and therefore you do optimally by defecting against it (or you could just apply the Theorem that says that CDT wins if the universe cannot read your mind).
Our disagreement here stems from the fact that we are considering different counterfactuals here. You seem to claim that UDT behaves correctly because U(UDT,BOT^UDT) > U(CDT,BOT^CDT) While I claim that CDT does because U(CDT, BOT^UDT) > U(UDT, BOT^UDT)
And in fact, given the way that I phrased the scenario, (which was that you play BOT^UDT not that you play BOT^{you} (i.e. the mirror matchup)) I happen to be right here. So justify it however you like, but UDT does lose this scenario.
Actually, you've oversimplified and missed something critical. In reality, the only way you can force BOT^UDT(X) = UDT(BOT^UDT) = C is if the universe does, in fact, read your mind. In general, UDT can map different epistemic states to different actions, so as long as BOT^UDT has no clue about the epistemic state of the UDT agent it has no way of guaranteeing that its output is the same as that of the UDT agent. Consequently, it's possible for the UDT agent to get DC as well. The only way BOT^UDT would be able to guarantee that it gets the same output as a particular UDT agent is if the universe was able to read the UDT agent's mind.
No. BOT(X) is cooperate for all X. It behaves in exactly the same way that CooperateBot does, it just runs different though equivalent code.
And my point was that CDT does better against BOT than UDT does. I was asked for an example where CDT does better than UDT where the universe cannot read your mind except via through your actions in counterfactuals. This is an example of such. In fact, in this example, the universe doesn't read your mind at all.
Also your argument that UDT cannot possibly do better against BOT than it does in analogous to the argument that CDT cannot do better in the mirror matchup than it does. Namely that CDT's outcome against CDT is at least as good as anything else's outcome against CDT. You aren't defining your counterfactuals correctly. You can do better against BOT than UDT does. You just have to not be UDT.
Actually, this is a somewhat general phenomenon. Consider for example, the version of Newcomb's problem where the box is full "if and only if UDT one-boxes in this scenario".
UDT's optimality theorem requires the in the counterfactual where it is replaced by a different decision theory that all of the "you"'s referenced in the scenario remain "you" rather than "UDT". In the latter counterfactual CDT provably wins. The fact that UDT wins these scenarios is an artifact of how you are constructing your scenarios.
A version of this problem was discussed here previously. It was also brought up during the decision theory workshop hosted by MIRI in 2013 as an open problem. As far as I know there hasn't been much progress on it since 2009.
I wonder if it's even coherent to have a math intuition which wouldn't be forcing UDT to cooperate (or defect) in certain conditions just to make 2*2 be 4 , figuratively speaking (as ultimately you could expand any calculation into an equivalent calculation involving a decision by UDT).
This is a good example. Thank you. A population of 100% CDT, though, would get 100% DD, which is terrible. It's a point in UDT's favor that "everyone running UDT" leads to a better outcome for everyone than "everyone running CDT."