Err, that's not CooperateBot, that's UDT. Yes, UDT cooperates with itself. That's the point. (Notice that if UDT defects here, the outcome is DD.)
It's not UDT. It's the strategy that against any opponent does what UDT would do against it. In particular, it cooperates against any opponent. Therefore it is CooperateBot. It is just coded in a funny way.
To be clear letting Y(X) be what Y does against X we have that BOT(X) = UDT(BOT) = C This is different from UDT. UDT(X) is D for some values of X. The two functions agree when X=UDT and in relatively few other cases.
Rational agents cannot be successfully blackmailed by other agents that simulate them accurately, and especially not by figments of their own imagination.
I think you mean that rational agents cannot be successfully blackmailed by others agents that for which it is common knowledge that the other agents can simulate them accurately and will only use blackmail if they predict it to be successful. All of this of course in the absence of mitigating circumstances (including for example the theoretical likelihood of other agents that reward you for counterfactualy giving into blackmail under these circumstances).
Humans are not perfect deceivers.
I suppose. On the other hand, is that because other people can read your mind or because you have emotional responses that you cannot suppress and are correlated to what you are thinking? This is actually critical to what counterfactuals you want to construct.
Consider for example the terrorist who would try to bring down an airplane that he is on given the opportunity. Unfortunately, he's an open book and airport security would figure out that he's up to something and prevent him from flying. This is actually inconvenient since it also means he can't use air travel. He would like to be able to precommit to not trying to take down particular flights so that he would be allowed on. On the other hand, whether or not this would work depends on what exactly airport security is picking up on. Are they actually able to discern his intent to cause harm, or are they merely picking up on his nervousness at being questioned by airport security. If it's the latter, would an internal precommitment to not bring down a particular flight actually solve his problem?
Put another way, is the TSA detecting the fact that the terrorist would down the plane if given the opportunity, or simply that he would like to do so (in the sense of getting extra utils from doing so).
Are there circumstances where the universe does not read your mind where CDT fails?
I'm sure we could think of some, but I want to address the question of "universe reads your mind". Social agents (ie: real, live people) reason about each-other's minds all the time. There is absolutely nothing weird or unusual about this, and there really oughtn't be anything weird about trying to formalize how it ought be done.
I'm sure we could think of some
OK. Name one.
The game is PD against CooperateBot, BUT before the game starts Omega announces "your opponent will make the same decision that UDT would if I told them this." This announcement causes UDT to cooperate against CooperateBot.
No. There is no obligation to do something just because Omega claims that you will.
First, if I know that my opponent is CooperateBot, then:-
- It is known that Omega doesn't lie.
- Therefore Omega has simulated this situation and predicted that I (UDT) cooperate.
- Hence, I can either cooperate, and collect the standard reward for CC.
- Or I can defect, in order to access an alternative branch of the problem (where Omega finds that UDT defects and does "something else").
- This alternative branch is unspecified, so the problem is incomplete.
UDT cooperates or defects depending on the contents of the alternative branch. If the alternative branch is unknown then it must guess, and most likely cooperates to be on the safe side.
Now, the problem is different if a CDT agent is put in my place, because that CDT agent does not control (or only weakly controls) the action of the UDT simulation that Omega ran in order to make the assertion about UDT's decision.
Fine. Your opponent actually simulates what UDT would do if Omega had told it that and returns the appropriate response (i.e. it is CooperateBot, although perhaps your finite prover is unable to verify that).
Or how about this example, that simplifies things even further. The game is PD against CooperateBot, BUT before the game starts Omega announces "your opponent will make the same decision that UDT would if I told them this." This announcement causes UDT to cooperate against CooperateBot. CDT on the other hand, correctly deduces that the opponent will cooperate no matter what it does (actually UDT comes to this conclusion too) and therefore decides to defect.
Actually, this is a somewhat general phenomenon. Consider for example, the version of Newcomb's problem where the box is full "if and only if UDT one-boxes in this scenario".
UDT's optimality theorem requires the in the counterfactual where it is replaced by a different decision theory that all of the "you"'s referenced in the scenario remain "you" rather than "UDT". In the latter counterfactual CDT provably wins. The fact that UDT wins these scenarios is an artifact of how you are constructing your scenarios.
That shouldn't be surprising. The CDT agents here are equivalent to DefectBot, and if they come into existence spontaneously, are no different than natural phenomena like rocks. Notice that the UDT agents in this situation do better than the alternative (if they defected, they would get 100% DD which is a way worse result). They don't care that some DefectBots get to freeload.
Of course, if the defectbots are here because someone calculated that UDT agents would cooperate and therefore being defectbot is a good way to get free utilons... then the UDT agents are incentivized to defect, because this is now an ultimatum game.
And in the variant where bots do know each other's identities, the UDT bots all get 99% CC / 1% DD and the CDT bots suck it.
Or how about this example, that simplifies things even further. The game is PD against CooperateBot, BUT before the game starts Omega announces "your opponent will make the same decision that UDT would if I told them this." This announcement causes UDT to cooperate against CooperateBot. CDT on the other hand, correctly deduces that the opponent will cooperate no matter what it does (actually UDT comes to this conclusion too) and therefore decides to defect.
That shouldn't be surprising. The CDT agents here are equivalent to DefectBot, and if they come into existence spontaneously, are no different than natural phenomena like rocks. Notice that the UDT agents in this situation do better than the alternative (if they defected, they would get 100% DD which is a way worse result). They don't care that some DefectBots get to freeload.
Of course, if the defectbots are here because someone calculated that UDT agents would cooperate and therefore being defectbot is a good way to get free utilons... then the UDT agents are incentivized to defect, because this is now an ultimatum game.
And in the variant where bots do know each other's identities, the UDT bots all get 99% CC / 1% DD and the CDT bots suck it.
The CDT agents here are equivalent to DefectBot
And the UDT agents are equivalent to CooperateBot. What's your point?
That shouldn't be surprising. The CDT agents here are equivalent to DefectBot, and if they come into existence spontaneously, are no different than natural phenomena like rocks. Notice that the UDT agents in this situation do better than the alternative (if they defected, they would get 100% DD which is a way worse result). They don't care that some DefectBots get to freeload.
Of course, if the defectbots are here because someone calculated that UDT agents would cooperate and therefore being defectbot is a good way to get free utilons... then the UDT agents are incentivized to defect, because this is now an ultimatum game.
And in the variant where bots do know each other's identities, the UDT bots all get 99% CC / 1% DD and the CDT bots suck it.
The CDT agents here win because they do not believe that altering their strategy will change the way that their opponents behave. This is actually true in this case, and even true for the UDT agents depending on how you choose to construct your counterfactuals. If a UDT agent suffered a malfunction and defected, it too would do better. In any case, the theorem that UDT agents perform optimally in universes that can only read your mind by knowing what you would do in hypothetical situations is false as this example shows.
UDT bots win in some scenarios where the initial conditions of the scenario favor agents that behave sub-optimally in certain scenarios (and by sub-optimally, I mean where counterfactuals are constructed in the way implicit to CDT). The example above shows that sometimes they are punished for acting suboptimally.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
What is your point, exactly?
It's clear that UDT can't do better vs "BOT" than by cooperating, because if UDT defects against BOT then BOT defects against UDT. Given that dependency, you clearly can't call it CooperateBot, and it's clear that UDT makes the right decision by cooperating with it because CC is better than DD.
No. BOT(X) is cooperate for all X. It behaves in exactly the same way that CooperateBot does, it just runs different though equivalent code.
And my point was that CDT does better against BOT than UDT does. I was asked for an example where CDT does better than UDT where the universe cannot read your mind except via through your actions in counterfactuals. This is an example of such. In fact, in this example, the universe doesn't read your mind at all.
Also your argument that UDT cannot possibly do better against BOT than it does in analogous to the argument that CDT cannot do better in the mirror matchup than it does. Namely that CDT's outcome against CDT is at least as good as anything else's outcome against CDT. You aren't defining your counterfactuals correctly. You can do better against BOT than UDT does. You just have to not be UDT.