I think the assumption that multiple actions have nonzero probability in the context of a deterministic decision theory is a pretty big problem. If you come up with a model for where these nonzero probabilities are coming from, I don't think your argument is going to work.
For instance, your argument fails if these nonzero probabilities come from epsilon exploration. If the agent is forced to take every action with probability epsilon, and merely chooses which action to assign the remaining probability to, then the agent will indeed purchase the contract for some sufficiently small price if , even if is not the optimal action (let's say is the optimal action). When the time comes to take an action, the agent's best bet is (prime meaning sell the contract for price ). The way I described the set-up, the agent doesn't choose between and , because actions other than the top choice all happen with probability epsilon. The fact that the agent sells the contract back in its top choice isn't a Dutch book, because the case where the agent's top choice goes through is the case in which the contract is worthless, and the contract's value is derived from other cases.
We could modify the epsilon exploration assumption so that the agent also chooses between and even while its top choice is . That is, there's a lower bound on the probability with which the agent takes an action in , but even if that bound is achieved, the agent still has some flexibility in distributing probability between and . In this case, contrary to your argument, the agent will prefer rather than , i.e., it will not get Dutch booked. This is because the agent is still choosing as the only action with high probability, and refers to the expected consequence of the agent choosing as its intended action, so the agent cannot use when calculating which of or is better to pick as its next choice if its attempt to implement intended action fails.
Another source of uncertainty that the agent could have about its actions is if it believes it could gain information in the future, but before it has to make a decision, and this information could be relevant to which decision it makes. Say that and are the agent's expectations at time of the utility that taking action would cause it to get, and the utility it would get conditional on taking action , respectively. Suppose the bookie offers the deal at time , and the agent must act at time . If the possibility of gaining future knowledge is the only source of the agent's uncertainty about its own decisions, then at time , it knows what action it is taking, and is undefined on actions not taken. and should both be well-defined, but they could be different. The problem description should disambiguate between them. Suppose that every time you say and in the description of the contract, this means and , respectively. The agent purchases the contract, and then, when it comes time to act, it evaluates consequences by , not , so the argument for why the agent will inevitably resell the contract fails. If the appearing in the description of the contract instead means (since the agent doesn't know what that is yet, this means the contract references what the agent will believe in the future, rather than stating numerical payoffs), then the agent won't purchase it in the first place because it will know that the contract will only have value if seems to be suboptimal at time and it takes action anyway, which it knows won't happen, and hence the contract is worthless.
OK, here's my position.
As I said in the post, the real answer is that this argument simply does not apply if the agent knows its action. More generally: the argument applies precisely to those actions to which the agent ascribes positive probability (directly before deciding). So, it is possible for agents to maintain a difference between counterfactual and evidential expectations. However, I think it's rarely normatively correct for an agent to be in such a position.
Even though the decision procedure of CDT is deterministic, this does not mean that agents described by CDT know what they will do in the future. We can think of this in terms of logical induction: the market is not 100% certain of its own beliefs, and in particular, doesn't typically know precisely what the maximum-expectation-action is.
One way of seeing the importance of this is to point out that CDT is a normative theory, not a descriptive one. CDT is supposed to tell you what arbitrary agents should do. The recommendations are supposed to apply even to, say, epsilon-exploring agents (who are not described by CDT, strictly speaking). But here we see that CDT recommends being dutch-booked! Therefore, CDT is not a very good normative theory, at least for epsilon-explorers. (So I'm addressing your epsilon-exploration example by differentiating between the agent's algorithm and the CDT decision theory. The agent isn't dutch-booked, but CDT recommends a dutch book.)
Granted, we could argue via dutch book that agents should know their own actions, if those actions are deterministic consequences of a know agent-architecture. However, theories of logical uncertainty tell us that this is not (always) realistic. In particular, we can adapt the bounded-resource-dutch-book idea from logical induction. According to this idea, some dutch-book-ability is OK, but agents should not be boundlessly exploitable by resource-bounded bookies.
This idea leads me to think that efficiently computable sequences of actions, which continue to have probability bounded away from zero (just before the decision), should have CDT expectations which converge to EDT expectations.
(Probably there's a stronger version, based on density-zero exploration type intuitions, where we can reach this conclusion even if the probability is not bounded away from zero, because the total probability is still unbounded.)
One conjecture which was supposed to be communicated by my more recent post was: in learnable environments, this will amount to: all counterfactual expectations converge to evidential expectations (provided the agent is sufficiently farsighted). For example, if the agent knows the environment is trap-free, then when counterfactual and evidential hypotheses continue to severely differ for some (efficiently enumerable) sequence of actions, then there will be a hypothesis which says "the evidential expectations are actually correct". The agent will want to check that hypothesis, because the VOI of significantly updating its counterfactual expectations is high. Therefore, these actions will not become sufficiently rare (unless the evidential and counterfactual expectations do indeed converge).
In other words, the divergence between evidential and counterfactual expectations is itself a reason why the action probability should be high, provided that the agent is not shortsighted and doesn't expect the action to be a trap.
If the agent is shortsighted and/or expects traps, then it normatively should not learn anyway (at least, not by deliberate exploration steps). In that case, counterfactual and evidential expectations may forever differ. OTOH, in that case, there's no reason to expect evidential expectations to be well-informed, so it kind of makes sense that the agent has little motive to adjust its counterfactual expectations towards them.
(But I'll still give the agent a skeptical look when it asserts that the two differ, since I know that highly informed positions never look like this. The belief that the two differ seems "potentially rational but never defensible", if that makes sense. I'm tempted to bake the counterfactual/evidential equivalence into the prior, on the general principle that priors should not contain possibilities which we know will be eliminated if sufficient evidence comes in. Yet, doing so might make us vulnerable to Troll Bridge.)
I thought about these things in writing this, but I'll have to think about them again before making a full reply.
We could modify the epsilon exploration assumption so that the agent also chooses between and even while its top choice is . That is, there's a lower bound on the probability with which the agent takes an action in , but even if that bound is achieved, the agent still has some flexibility in distributing probability between and .
Another similar scenario would be: we assume the probability of an action is small if it's sub-optimal, but smaller the worse it is.
So the overall expectation is .
Wouldn't it be P(Act=a|do(buy B)) rather than P(Act=a)? Like my thought would be that the logical thing for CDT would be to buy the contract and then as a result its expected utilities change, which leads to its probabilities changing, and as a result it doesn't want to sell the contract. I'd think this argument only puts a bound on how much cdt and edt can differ, rather than on whether they can differ at all. Very possible I'm missing something though.
I agree with this, but I was assuming the CDT agent doesn't think buying B will influence the later decision. This, again, seems plausible if the payoff is made sufficiently small. I believe that there are some other points in my proof which make similar assumptions, which would ideally be made clearer in a more formal write-up.
However, I think CDT advocates will not generally take this to be a sticking point. The structure of my argument is to take a pre-existing scenario, and then add bets. For my argument to work, the bets need to be "independent" of critical things (causally and/or evidentially independent) -- in the example you point out, the action taken later needs to be causally independent of the bet made earlier (more specifically, causal-conditioning on the bet should not change beliefs about what action will be taken).
This is actually very similar to traditional Dutch-book arguments, which treat the bets as totally independent of everything. I could argue that it's just part of the thought experiment; if you concede that there could be a scenario like that, then you concede that CDT gets dutch-booked.
If you don't buy that, but you do buy Dutch Books as a methodology more generally, then I think you have to claim there's some rule which forbids "situations like this" (so CDT has to think the bets are not independent of everything else, in such a way as to spoil my argument). I would be very interested if you could propose a sensible view like this. However, I think not: there doesn't seem to be anything about the scenario which violates some principle of causality or rationality. If you forbid scenarios like this, you seem to be forbidding a very reasonable scenario, for no good reason (other than to save CDT).
This, again, seems plausible if the payoff is made sufficiently small.
How do you make the payoff small?
This is actually very similar to traditional Dutch-book arguments, which treat the bets as totally independent of everything.
Isn't your Dutch-book argument more recursive than standard ones? Your contract only pays out if you act, so the value of the dutch book causally depends on the action you choose.
Isn't your Dutch-book argument more recursive than standard ones? Your contract only pays out if you act, so the value of the dutch book causally depends on the action you choose.
Sure, do you think that's a concern? I was noting the similarity in this particular respect (pretending that bets are independent of everything), not in all respects.
Note, in particular, that traditional dutch book arguments make no explicit assumption one way or the other about whether the propositions have to do with actions under the agent's control. So I see two possible interpretations of traditional Dutch books:
How do you make the payoff small?
For example, multiply the contract payoff by 0.001.
Think of it this way. Making bets about your actions (or things influenced by your actions) can change your behavior. But if you keep the bets small enough, then you shouldn't change your behavior; the bets are less important than other issues. (Unless two actions are exactly tied, in terms of other issues.)
I will concede that this isn't 100% convincing. Perhaps different laws of probability should apply to actions we can influence. OTOH, I'm not sure what laws those would be.
So I see two possible interpretations of traditional Dutch books:
I disagree, I don't think it's a simple binary thing. I don't think Dutch book arguments in general never apply to recursive things, but it's more just that the recursion needs to be modelled in some way, and since your OP didn't do that, I ended up finding the argument confusing.
The standard dutch book arguments would apply to the imp. Why should you be in such a different position from the imp?
I don't think your argument goes through for the imp, since it never needs to decide its action, and therefore the second part of selling the contract back never comes up?
For example, multiply the contract payoff by 0.001.
Hmm, on further reflection, I had an effect in mind which doesn't necessarily break your argument, but which increases the degree to which other counterarguments such as AlexMennen's break your argument. This effect isn't necessarily solved by multiplying the contract payoff (since decisions aren't necessarily continuous as a function of utilities), but it may under many circumstances be approximately solved by it. So maybe it doesn't matter so much, at least until AlexMennen's points are addressed so I can see where it fits in with that.
Hmm, on further reflection, I had an effect in mind which doesn't necessarily break your argument, but which increases the degree to which other counterarguments such as AlexMennen's break your argument. This effect isn't necessarily solved by multiplying the contract payoff (since decisions aren't necessarily continuous as a function of utilities), but it may under many circumstances be approximately solved by it. So maybe it doesn't matter so much, at least until AlexMennen's points are addressed so I can see where it fits in with that.
I disagree, I don't think it's a simple binary thing. I don't think Dutch book arguments in general never apply to recursive things, but it's more just that the recursion needs to be modelled in some way, and since your OP didn't do that, I ended up finding the argument confusing.
But what does that look like? How should it make a difference? (This isn't a rhetorical question; I would be interested in a positive position. My lack of interest is, significantly, due to a lack of positive positions in this direction.)
I don't think your argument goes through for the imp, since it never needs to decide its action, and therefore the second part of selling the contract back never comes up?
Ah, true, but the imp will necessarily just make EDT-type predictions anyway. So the imp argument reaches a similar conclusion.
But I'm not claiming the imp argument is very strong in any case, it's just an intuition pump.
So, silly question that doesn't really address the point of this post (this may very well be just a point of clarity thing but it would be useful for me to have an answer due to earning-to-give related reasons off-topic for this post) --
Here you claim that CDT is a generalization of decision-theories that includes TDT (fair enough!):
Here, "CDT" refers -- very broadly -- to using counterfactuals to evaluate expected value of actions. It need not mean physical-causal counterfactuals. In particular, TDT counts as "a CDT" in this sense.
But here you describe CDT as two-boxing in Newcomb, which conflicts with my understanding that TDT one-boxes coupled with your claim that TDT counts as a CDT:
For example, in Newcomb, CDT two-boxes, and agrees with EDT about the consequences of two-boxing. The disagreement is only about the value of the other action.
So is this conflict a matter of using the colloquial definition of CDT in the second quote but a broader one in the first, having a more general framework for what two-boxing is than my own, or knowing something about TDT that I don't?
Ah, yeah, I'll think about how to clear this up. The short answer is that, yes, I slipped up and used CDT in the usual way rather than the broader definition I had set up for the purpose of this post.
On the other hand, I also want to emphasize that EDT two-boxes (and defects in twin PD) much more easily than I see commonly supposed. And, thus, to the extent one wants to apply the arguments of this post to TDT, TDT would also. Specifically, an EDT agent can only see something as correlated with its action if that thing has more information about the action than the EDT agent itself. Otherwise, the EDT agents own knowledge about its action screens off any correlation.
This means that in Newcomb with a perfect predictor, EDT one-boxes. But in Newcomb where the predictor is only moderately good, in particular knows as much or less than the agent, EDT two-boxes. So, similarly, TDT must two-box in these situations, or be vulnerable to the Dutch Book argument of this post.
I've noticed that one could read the argument and say: "Ok, an agent evaluates a parameter U differently at different times. Thus, a bookmaker exploits the agent with a bet/certificate whose value depends on U. What's special about this?"
Of course the answer lies in the difference between cdt(a) and edt(a), specifically you wrote:
The key point here is that because the agent is betting ahead of time, it will evaluate the value of this bet according to the conditional expectation E(U|Act=a).
and
Now, since the agent is reasoning during its action, it is evaluating possible actions according to cdt(a); so its evaluation of the bet will be different.
I think developing this two points would be useful to readers since, usually, the pivotal concepts behind EDT and CDT are considered to be "conditional probabilities" and "(physical) causation" respectively, while here you seem to point at something different about the times at which decisions are made.
***
Unrelated to what I just wrote:
XXX insert the little bit about free will and stuff that I want to remove from the main argument... no reason to spend time justifying it there if I have a whole section for it here
I guess here you wanted to say something interesting about free will, but it was probably lost from the draft to the final version of the post.
I guess here you wanted to say something interesting about free will, but it was probably lost from the draft to the final version of the post.
Ah whoops. Fixed.
I think developing this two points would be useful to readers since, usually, the pivotal concepts behind EDT and CDT are considered to be “conditional probabilities” and “(physical) causation” respectively, while here you seem to point at something different about the times at which decisions are made.
I'm not sure what you mean here. The "two different times" are (1) just before CDT makes the decision, and (2) right when CDT makes the decision. So the two times aren't about differentiating CDT and EDT.
The part that I don't get is the reason why the agent is betting ahead of time implies evaluation according to edt, while the agent is reasoning during its action implies evaluation according to cdt. Sorry if I'm missing something trivial, but I'd like to receive an explanation because this seems a fundamental part of the argument.
Oh right, OK. That's because of the general assumption that rational agents bet according to their beliefs. If a CDT agent doesn't think of a bet as intervening on a situation, then when betting ahead of time, it'll just bet according to its probabilities. But during the decision, it is using the modified (interventional) probabilities. That's how CDT makes decisions. So any bets which have to be made simultaneously, as part of the decision, will be evaluated according to those modified beliefs.
This is an old post, but my idea of CDT is that it's a rule for making decisions, not for setting beliefs. Thus the agent never believes in the outcome given by CDT, just that it should choose according to the payoffs it calculates. This is a seemingly weird way to do things, but apart from that is there a reason I should think about CDT as a prescription for forming beliefs while I am acting?
My confusion was: even "when the agent is acting", I think it would still be appropriate to describe its beliefs according to EDT. However, I was confused by thinking about "...and then offering a bet". As far as I can tell, this is just an unnecessary bit of storytelling set around a two step decision problem, and a CDT agent has to evaluate the prospects of each decision according to CDT.
Minor bibliographical note: A related academic paper is Arif Ahmed's unpublished paper, "Sequential Choice and the Agent's Perspective". (This is from memory -- I read that paper a few years ago.)
Pausing reading at "Paul's simple argument" to jot this down: The expected values are identical when you're conditioning on all the parent nodes of the action (i.e. you have full knowledge of your own decision-making process, in your decision-making process). But if you can't do that, then it seems like EDT goes nuts - e.g. if there's a button you won't want to press, and you're not conditioning on your own brain activity, then EDT might evaluate the expected utility of pressing the button by assuming you have a harmful seizure that makes you hit the button, since it just looks for the most likely route, irrespective of utility. This seems like it might be related to problems of embodied cognition - not conditioning on all parent nodes of the action leads the EDT algorithm to treat the seizure as "just part of the plan," when in fact it's a breakdown of the cognition implementing EDT in the first place.
After reading more of the post, what seems to be going on is exploiting the difference between the output of the decision algorithm and what gets counted as an action. Sticking with the seizure example - in the Dutch book scenario, the CDT agent happily buys "seizure insurance conditional on A=a." Then when it's making its choice to not press the button, it notices that it gets the highest utility from "sell back the seizure insurance and don't press the button," so it tries to do that - but some small percentage of the time it still has the seizure before accidentally pressing the button. I'm not sure that a decision theory that models the distinction between its choice and the "actual action" sells back the seizure insurance there. In fact, we could probably fix the problem entirely within CDT by modeling the decision algorithm output as the "actual action" and the intervening seizure a stochastic part of the environment.
I... think this post was impacted by a bug in the LW API that GreaterWrong ran into, that made it so that it wasn't visible on the frontpage when it was published. It nevertheless appears to have gotten some amount of engagement, but maybe that was all from direct links?
Given the substantial chance that a number of people have never seen this post, I reposted it. Its original publishing date was the 11th of June.
Ah, now the fact that I forgot to include an illustration (which I had drawn while writing the post) until months later feels like less of a waste! :)
This post has benefited greatly from discussion with Sam Eisenstat, Caspar Oesterheld, and Daniel Kokotajlo.
Last year, I wrote a post claiming there was a Dutch Book against CDTs whose counterfactual expectations differ from EDT. However, the argument was a bit fuzzy.
I recently came up with a variation on the argument which gets around some problems; I present this more rigorous version here.
Here, "CDT" refers -- very broadly -- to using counterfactuals to evaluate expected value of actions. It need not mean physical-causal counterfactuals. In particular, TDT counts as "a CDT" in this sense.
"EDT", on the other hand, refers to the use of conditional probability to evaluate expected value of actions.
Put more mathematically, for action a∈A, EDT uses E(U|Act=a), and CDT uses E(U|do(Act=a)). I'll write edt(a) and cdt(a) to keep things short.
My argument could be viewed as using Dutch Books to formalize Paul Christiano's "simple argument" for EDT:
However, I do not currently view the argument as favoring EDT over CDT! Instead it supports the weaker claim that the two had better agree. Indeed, the Troll Bridge problem strongly favors CDT whose expectations agree with EDT over EDT. So, this is intended to provide a strong constraint on a theory of (logical) counterfactuals, not necessarily abolish the need for them. (However, the constraint is a strong one, and it's worth considering the possibility that this constraint is all we need for a theory of counterfactuals.)
The Basic Argument
Consider any one action a∈A for which edt(a)≠cdt(a), in some decision problem. We wish to construct a modified decision problem which Dutch-books the CDT.
My argument requires an assumption that the action a is assigned nonzero probability. This is required to ensure edt(a) is defined at all (since otherwise we would be conditioning on a probability zero event), but also for other reasons, which we'll see later on.
Anyway, as I was saying, we wish to take the decision problem which produces the disagreement between edt(a) and cdt(a), and from it, produce a new decision problem which is a Dutch book.
For clarity: U is the random variable which represents utility (ie, what agents want by definition) in the original decision problem. This random variable will still exist in the modified decision problem, but the agents will now desire a modified random variable, V. The value of V is just U modified by whatever the agent gains or loses by interacting with side-bets.
The new decision problem will be a two-step sequential decision problem. Immediately before the original decision, the bookie offers to sell the agent the following bet B, for a price of 2d utilons. B is a bet conditional on a, in which the buyer is betting against cdt(a)'s expectation and in favor of edt(a)'s expectation. For example:
B: In the case that A=a, the seller of this certificate owes the purchaser of this certificate (cdt(a)−U)⋅s, where s is the signum |cdt(a)−edt(a)|cdt(a)−edt(a).
The key point here is that because the agent is betting ahead of time, it will evaluate the value of this bet according to the conditional expectation E(U|Act=a).
If cdt(a)>edt(a), so s=1, then the value of B in the case that Act=a is cdt(a)−U. The expectation of this is cdt(a)−edt(a), which again, we have supposed is positive. So the overall expectation is P(Act=a)⋅(cdt(a)−edt(a)). Setting d low enough ensures that the agent will be happy to take this bet. Similarly, if edt(a)>cdt(a), the value of the bet ends up being P(Act=a)⋅(edt(a)−cdt(a)) and the agent still takes it for the right price.
Now, the second stage of our argument. As the agent is making the decision the bookie again makes an offer. (In other words, we extend the original set of actions A to contain twice as many valid actions; half in which we accept, half in which we don't accept.) The new offer is this: "I will buy the bet B from you for d utilons."
Now, since the agent is reasoning during its action, it is evaluating possible actions according to cdt(a); so its evaluation of the bet will be different. Here, the argument splits into two cases:
So the result is the same in either case -- CDT recommends selling B back to the bookie no matter what.
The agent has paid 2d to buy B, and gotten only b when selling back. Buying and selling the contract cancel each other out. So the agent is down d utilons for no gain!
Here is an illustration of the entire Dutch Book:
Assumptions
Non-Zero Probability of a
A really significant assumption of this argument is that actions are given nonzero probability -- particularly, that the target action a has a nonzero probability. This assumption is important, since the initial evaluation of B is P(Act=a)⋅|cdt(a)−edt(a)|. If the probability of action a were zero, there would be no price the agent would be willing to pay for the bet.
The assumption is also required in order to guarantee that edt(a) is well-defined -- although we could possibly use tricks to get around that, specifying a variant of EDT which defines some expectation in all cases.
Many of my arguments for CDT=EDT rest on this assumption, though, so it isn't anything new. It seems to be a truly important requirement, rather than an artefact of the argument.
There are many justifications of the assumption which one might try to give. I have often invoked epsilon-exploration; that is, the idea that some randomness needs to be injected into an agents actions in order to ensure that it can try all options. I don't like invoking that as much as I used to. I might make the weaker argument that agents should use the chicken rule, IE, refuse to take any action which they can prove they take. (This can be understood as weaker than epsilon-exploration, because epsilon-exploration can be implemented by the epsilon-chicken rule: take any action which you assign probability less than epsilon to.) This rule ensures that agents can never prove what they do (so long as they use a sound logic). We can then invoke the non-dogmatism principle, which says that we should never assign probability 0 to a possibility unless we've logically refuted it.
Or, we could invoke a free-will principle, claiming that agents should have the subjective illusion of freedom.
In the end, though, what we have is an argument that applies if and only if a has nonzero probability. All the rest is just speculation about how broadly this argument can be applied.
An interesting feature of the argument is that the less probable action a according to the agent, the less money we can get by Dutch-booking them on discrepancies between cdt(a) and edt(a). This doesn't matter for traditional Dutch Book arguments -- any sure loss is considered a failure of rationality. However, if we take a logical-induction type approach to rationality, smaller Dutch Books are less important -- boundedly rational agents are expected to lose some money to Dutch Books, and are only trying to avoid losing too much.
So, one might consider this to be a sign that, in some hypothetical bounded-rationality approach to decision theory, lower-probability actions would be allowed to maintain larger discrepancies between edt(a) and cdt(a), and maintain them for longer.
Probabilities of Actions in the Modified Problem
A trickier point is the way probabilities carry over from the original decision problem to the modified problem. In particular, I assume the underlying action probabilities do not change. Yet, I split each action in two!
One justification of this might be that, for agents who choose according to CDT, it shouldn't change anything -- at the moment of decision, the bet B is worth nothing, so it doesn't bias actions in one direction or another.
Ultimately, though, I think this is just part of the problem setup. Much like money-pump arguments posit magical genies who can switch anything for anything else, I'm positing a bookie who can offer these bets without changing anything. The argument -- if you choose to accept it -- is that the result is disturbing in any case. It does not seem likely that an appealing theory of counterfactuals is going to wriggle out of this specifically by denying the premise that action probabilities remain the same.
Note, however, that it is not important to my argument that none of the new actions get assigned zero probability. It's only important that the sum of P(a) and P(a′) in the new problem equals the original decision problem's P(a).
Counterfactual Evaluation of Bets
Another assumption I didn't spell out yet is the interaction of bet contracts with counterfactual evaluations.
I assume that counterfacting on accepting the bet does not change probabilities of other things, such as the probability of the actions. This could be a large concern in general -- taking a conditional bet on Act=a might make us want to choose a on purpose, in order to cash in on the bet. This isn't a problem in this case, since the agent later evaluates the bet to be worth nothing. However, that doesn't necessarily mean it's not an issue according to the counterfactual evaluation, which would chance the perceived value of B. Or, even more problematic, the agent's counterfactual expectations might say that taking the bet would result in some very negative event -- making the agent simply refuse. So the argument definitely assumes "reasonable counterfactual evaluations" in some sense.
On the other hand, this kind of reasoning is very typical for Dutch Book arguments. The bets are grafted onto the situation without touching any of the underlying probabilities -- so, e.g., you do not normally ask "is accepting the bet against X going to make X more probable?".
Handling Some Possible Objections
Does the Bookie Cheat?
You might look at my assumptions and be concerned that the bookie is cheating by using knowledge which the agent does not have. If a bookie has insider information, and uses that to get a sure profit, it doesn't count as a Dutch Book! For example, if a bookie knows all logical facts, it can money-pump any agent who does not know all logical facts (ie, money-pump any fixed computable probability distribution). But that isn't fair.
In this case, one might be concerned about the agent not knowing its own action. Perhaps I'm sneaking in an assumption that agents are uncertain of their own actions, and then Dutch-booking them by taking advantage of that fact, via a bookie who can easily predict the agent's action.
To this I have a couple of responses.
The bookie does not know the agent's choice of action. The bookie's strategy doesn't depend on this. In particular, note the disjunctive form of the argument: either the agent prefers a, in which case B is worthless for one reason, or the agent prefers a different action, in which case B is worthless for a different reason. The bookie is setting things up so that it's safe no matter what.
The agent knows everything the bookie knows, from the beginning. All the bookie needs in order to implement its strategy is the the values of cdt(a) and edt(a), and, in order to set the price d, the probability of the action a.These are things which the agent also knows.
The Agent Justifiably Revises Its Position
Another critique I have received is that it makes perfect sense that the agent takes the bet at the first choice point and later decides against it at the second choice point. The agent has gained information -- namely, when considering an action, the agent knows it will take that action. This extra information is being used to reject the bet. So it's perfectly reasonable.
Again I have a couple of responses.
The agent does not learn anything between the two steps of the game. There is no new observation, or additional information of any kind, between the step when the bookie offers B and the step when the bookie offers to buy B back. As the agent is evaluating a particular action, it does not "know" it will carry out that action -- it is only considering what would happen if it carried out that action!
Even if the agent did learn something, it would not justify being Dutch-booked. Consider two-stage games in which an agent is offered a bet, then learns some information, and then given a choice to sell the bet back for a fee. It is perfectly reasonable for an agent to in some cases sell the bet back. What makes a Dutch book, however, is if the agent always sells the bet back. It should never be the case that an agent predictably won't want the bet later, no matter what it observes. If that were the case (as it is in my scenario), the agent should not have accepted the bet in the first place. It's critical here to again note that the agent prefers to sell back the bet for every possible action -- that is, the original actions are always judged worse than their modified copies in which the sell-back deal is taken. So, even if we think of the agent as "learning" which action it selects when it evaluates selecting an action, we can see that it decides to sell back the bet no matter what it learns.
But Agents WILL Know What They'll Do
One might say that the argument doesn't mean very much at all in practice because from the information it knows, the agent should be able to derive its action. It knows how it evaluates all the actions, so, it should just know that it takes the argmax. This means the probability of the action actually taken is 1, and the probability of the rest of the actions is zero. As a result, my argument would only apply to the argument actually taken -- and a CDT advocate can easily concede that cdt(a)=edt(a) when a is the action actually taken. It's other actions that one might disagree about. For example, in Newcomb, classical physical-causality CDT two-boxes, and agrees with EDT about the consequences of two-boxing. The disagreement is only about the value of the other action.
(Note, however, that the CDT advocate is still making a significant concession here; in particular, this rules out the classic CDT behavior in Death and Damascus and many variations on that problem. I don't know exactly how a classic physical-causality CDT advocate would maintain such a position.)
There are all kinds of problems with the agent knowing its own action, but a CDT advocate can very naturally reply that these should be solved with the right counterfactuals, not by ensuring that the agent is unsure of its actions (through, e.g., epsilon-exploration).
I'll have more to say about this objection later, but for now, a couple of remarks.
First and foremost, yeah, my argument doesn't apply to actions which the agent knows it won't take. I think the best view of the phenomenon here is that, if the agent really does know exactly what it will do, then yeah, the argument really does collapse to saying its evidential expectations should equal its counterfactual expectations for that one action. Which is like saying that, if P(A)=1, then we had better have P(X|do(A))=P(X) -- counterfacting on something true should never change anything.
Certainly it's quite common to think of agents as knowing exactly what they'll do; for example, that's how backwards-induction in game theory works. And at MIRI we like to talk about problems where the agent can know exactly what it can do, because these stretch the limits of decision theory.
On the other hand, realistic agents probably mostly don't know with certainty what they'll do -- meaning my argument will usually apply in practice.
The agent might not follow the recommendations of CDT. Just because a CDT-respecting agent would definitely do a specific thing given all the information, does not mean that we have to imagine the agent in my argument knowing exactly what it will do. The agent in the argument might not be CDT-respecting.
Here on LessWrong, and at MIRI, there is often a tendency to think of CDT or EDT as the agent -- that is, think of agents as instances of decision theories. This is a point of friction between MIRI's way of thinking and that of academic philosophy. In academic philosophy, the decision theory need not be an algorithm the agent is actually running (or indeed, could ever run). A decision theory is a normative theory about what an agent should do. This means that CDT, as a normative theory, can produce recommendations for non-CDT agents; and, we can judge CDT on the correctness or incorrectness of those recommendations.
Now, I think there are some advantages to the MIRI tendency -- for example, thinking in this way brings logical uncertainty to the forefront. However, I agree nonetheless with making a firm distinction between the decision theory -- a normative requirement -- and the decision procedure -- a real algorithm you can run, which obeys the normative requirement. The logical induction algorithm vs the logical induction criterion illustrates a similar idea.
Academic decision theorists extend this idea to the criticism of normative principles -- such as CDT and EDT -- for their behavior in scenarios which an agent would never get into, if it were following the advice of the respective decision theory. This is what's going on in the bomb example Will MacAskill uses. (Nate Soares argues against this way of reasoning, saying "decisions are for making bad outcomes inconsistent".)
If we do endorse the idea, this offers further support for the argument I'm making. It means we get to judge CDT for recommending that an agent accept a Dutch-book, even if the scenario depends on uncertainty over actions which a CDT advocate claims a CDT-compliant agent does not have.
This is particularly concerning for CDT, because this kind of argument is used especially to defend CDT. For example, it's hard to justify smoking lesion as a situation which a CDT or EDT agent could actually find itself in; but, a CDTer might reply, a decision theory needs to offer the right advice to a broad variety of agents. So CDT is already in the business of defending normative claims about non-CDT agents.
Dynamic Consistency
One might object: the argument simply illustrates a dynamic inconsistency in CDT. We already know that both CDT and EDT are dynamically inconsistent. What's the big deal?
Let me make some buckshot remarks before I dive into my main response here:
This third point is the one I want to expand on.
In decision problems where the payoff depends only on actions actually taken, not on your policy, there is a powerful argument for the dynamic consistency of EDT:
Think of the entire observation/action history as a tree. Dynamic consistency means that at earlier points in the tree, the agent does not prefer for the decisions of later selves to be different from what they will be. The restriction to actions (not policies) mattering for payoffs means this: selecting one action rather than another changes which branch we go down in the tree, but does not change the payoffs of other branches in the tree. This means that even from a perspective beforehand, an action can only make a difference down the branch where it is taken -- no spooky interactions across possible worlds. As a result, thinking about possible choices ahead of time, the contribution to early expected utility is exactly the expected utility that action will be assigned later at the point of decision, times the probability the agent ends up in that situation in the first place. Therefore, the preference about the decision must stay the same.
So, EDT is a dynamically consistent choice when actions matter but policy does not.
Importantly, this is not a no-Newcomblike-problems condition. It rules out problems such as counterfactual mugging, transparent Newcomb, Parfit's hitchhiker, and XOR Blackmail. However, it does not rule out the original Newcomb problem. In particular, we are highlighting the inconsistency where CDT wishes to be a 1-boxer, and similar cases.
Now, you can make a very similar argument for the dynamic consistency of CDT, if you define dynamic consistency based on counterfactuals: would you prefer to counterfact on your future self doing X? For Newcomb's problem, this gets us back to consistency -- for all that the CDT agent wishes it could be EDT, it would have no interest in the point-intervention that makes its future self one-box, for the usual reason: that would not cause Omega to change its mind.
However, this definition seems not to capture the most useful notion of dynamic consistency, since the same causal CDT agent would happily precommit to one-box. So I find the EDT version of the argument more convincing. I'm not presently aware of a similar example for EDT -- it seems Omega needs to consider the policy, not just the action really taken, in order to make EDT favor changing its actions via precommitments.
More on Probability Zero Actions
As I've said, my argument depends on the action a having nonzero probability. EDT isn't well-defined otherwise; how do you condition on a probability-zero event? However, there are some things we could try in order to get around the problem of division by zero - filling in the undefined values with sensible numbers. For example, we can use the Conditional Oracle EDT which Jessica Taylor defined, to "fill in" the otherwise-undefined conditionals.
However, recall I said that the argument was blocked for two reasons:
So, if we've somehow made edt(a) well-defined for probability-zero a, can we patch the second problem?
We can try to flip the argument I made around: for probability zero actions, we pay the agent to take on a bet (so that it will do it even though it's worthless). Then later, we charge the agent to offload the bet which it now thinks is unfavorable.
The problem with this argument is, if the agent doesn't take action a anyway, then the conditional bet will be nullified regardless; we can't force the agent into a corner where it prefers to nullify the bet, so, we don't get a full Dutch Book (because we can't guarantee that we make money off the agent -- indeed, we would only make money if the agent ends up taking the action which it previously assigned probability zero to taking).
However, we do get a moderately damning result: limiting attention to just the action a and the new alternative a′ which both does a and also pays to cancel the bet, *CDT strictly prefers that the agent be Dutch Booked rather than just do a. This seems pretty bad: CDT isn't actually recommending taking a Dutch Book, BUT, it would rather take a Dutch Book than take an alternative which is otherwise the same but which does not get Dutch Booked.
So, we can still make a moderately strong argument against divergence between counterfactuals and conditionals, even if actions have probability zero. But not a proper Dutch Book.
A Few Words on Troll Bridge
At the beginning of this, I said that I didn't necessarily take this to be an argument for EDT over CDT. In the past, I've argued this way:
However, this argument has at least two points against it. First, the arguments for CDT=EDT generally have some assumptions, such as nonzero probability for actions. CDT is a strictly more general framework when those conditions are not met. Theories of rational agency should be as inclusive as possible, when rationality does not demand exclusivity. So one might still prefer CDT.
Second, as I mentioned at the beginning of this post, the Troll Bridge problem strongly favors CDT over EDT. Counterintuitively, it's perfectly possible for a CDT agent to keep its counterfactual expectations exactly in agreement with its conditional expectations, and yet get Troll Bridge right -- even though we are doomed to get Troll Bridge wrong if we directly use our conditional expectations. Insisting on a distinction "protects" us from spurious counterfactual reasoning. (I may go over this phenomenon in more detail in a future post. But perhaps you can see why by reviewing the Troll Bridge argument.)
So, my current take on the CDT=EDT hypothesis is this:
This is not a strongly held view, but it is the view that has made the most sense of counterfactual reasoning for me.
As I've mentioned in the past, the CDT=EDT hypothesis is almost the most boring possible answer to the question "how do (logical) counterfactuals work?" -- it doesn't do very much to help us solve interesting decision problems. If we factor decision theory into the two parts (1) "What are the (logical) counterfactuals?" (2) "How do we use counterfactuals to make decisions?" then I see the CDT=EDT hypothesis as a solution to (1) which shoves an awful lot of the interesting work of decision theory into (2). IE, to solve the really interesting problems, we would need logically-updateless UDT or even more exotic approaches.
In particular, for variants of Newcomb's problem where the predictor is quite strong but doesn't know as much as the agent does about what the agent will choose, this post implies that TDT either two-boxes, or, is vulnerable to the Dutch Book I construct. This is unfortunate.
Conclusion
Frankly, I find it somewhat embarrassing that I'm still going on about CDT vs EDT. After all, Paul Christiano said, partially in response to my own writing which he cited:
I wish I could say this will be my final word on the subject. The contents of this post do feel quite definitive in the sense of giving a settled, complete view. However, the truth is that it only represents my view as of November or early December of 2019. Late December and early January saw some developments which I'm excited to work out further and post about.