Are causal decision theorists trying to outsmart conditional probabilities?

Caspar Oesterheld

Presumably, this has been discussed somewhere in the past, but I wonder to which extent causal decision theorists (and many other non-evidential decision theorists, too) are trying to make better predictions than (what they think to be) their own conditional probabilities.

To state this question more clearly, let’s look at the generic Newcomb-like problem with two actions a1 and a2 (e.g., one-boxing and two-boxing, cooperating or defecting, not smoking or smoking) and two states s1 and s2 (specifying, e.g., whether there is money in both boxes, whether the other agent cooperates, whether one has cancer). The Newcomb-ness is the result of two properties:

No matter the state, it is better to take action a2, i.e. u(a2,s1)>u(a1,s1) and u(a2,s2)>u(a1,s2). (There are also problems without dominance where CDT and EDT nonetheless disagree. For simplicity I will assume dominance, here.)
The action cannot causally affect the state, but somehow taking a1 gives us evidence that we’re in the preferable state s1. That is, P(s1|a1)>P(s1|a2) and u(a1,s1)>u(a2,s2).

Then, if the latter two differences are large enough, it may be that

E[u|a1] > E[u|a2].

I.e.

P(s1|a1) * u(s1,a1) + P(s2|a1) * u(s2,a1) > P(s1|a2) * u(s1,a2) + P(s2|a2) * u(s2,a2),

despite the dominance.

Now, my question is: After having taken one of the two actions, say a1, but before having observed the state, do causal decision theorists really assign the probability P(s1|a1) (specified in the problem description) to being in state s1?

I used to think that this was the case. E.g., the way I learned about Newcomb’s problem is that causal decision theorists understand that, once they have said the words “both boxes for me, please”, they assign very low probability to getting the million. So, if there were a period between saying those words and receiving the payoff, they would bet at odds that reveal that they assign a low probability (namely P(s1,a2)) to money being under both boxes.

But now I think that some of the disagreement might implicitly be based on a belief that the conditional probabilities stated in the problem description are wrong, i.e. that you shouldn’t bet on them.

The first data point was the discussion of CDT in Pearl’s Causality. In sections 1.3.1 and 4.1.1 he emphasizes that he thinks his do-calculus is the correct way of predicting what happens upon taking some actions. (Note that in non-Newcomb-like situations, P(s|do(a)) and P(s|a) yield the same result, see ch. 3.2.2 of Pearl’s Causality.)

The second data point is that the smoking intuition in smoking lesion-type problems may often be based on the intuition that the conditional probabilities get it wrong. (This point is also inspired by Pearl’s discussion, but also by the discussion of an FB post by Johannes Treutlein. Also see the paragraph starting with “Then the above formula for deciding whether to pet the cat suggests...” in the computer scientist intro to logical decision theory on Arbital.)

Let’s take a specific version of the smoking lesion as an example. Some have argued that an evidential decision theorist shouldn’t go to the doctor because people who go to the doctor are more likely to be sick. If a1 denotes staying at home (or, rather, going anywhere but a doctor) and s1 denotes being healthy, then, so the argument goes, P(s1|a1) > P(s1|a2). I believe that in all practically relevant versions of this problem this difference in probabilities disappears once we take into account all the evidence we already have. This is known as the tickle defense. A version of it that I agree with is given in section 4.3 of Arif Ahmed’s Evidence, Decision and Causality. Anyway, let’s assume that the tickle defense somehow doesn’t apply, such that even if taking into account our entire knowledge base K, P(s1|a1,K) > P(s1|a2,K).

I think the reason why many people think one should go to the doctor might be that while asserting P(s1|a1,K) > P(s1|a2,K), they don’t upshift the probability of being sick when they sit in the waiting room. That is, when offered a bet in the waiting room, they wouldn’t accept exactly the betting odds that P(s1|a1,K) and P(s1|a2,K) suggest they should accept.

Maybe what is going on here is that people have some intuitive knowledge that they don’t propagate into their stated conditional probability distribution. E.g., their stated probability distribution may represent observed frequencies among people who make their decision without thinking about CDT vs. EDT. However, intuitively they realize that the correlation in the data doesn’t hold up in this naive way.

This would also explain why people are more open to EDT’s recommendation in cases where the causal structure is analogous to that in the smoking lesion, but tickle defenses (or, more generally, ways in which a stated probability distribution could differ from the real/intuitive one) don’t apply, e.g. the psychopath button, betting on the past, or the coin flip creation problem.

I’d be interested in your opinions. I also wonder whether this has already been discussed elsewhere.

Acknowledgment

Discussions with Johannes Treutlein informed my view on this topic.

I am not actually here, but

"Note that in non-Newcomb-like situations, P(s|do(a)) and P(s|a) yield the same result, see ch. 3.2.2 of Pearl’s Causality."

This is trivially not true.

CDT solves Newcomb just fine with the right graph: https://www.youtube.com/watch?v=rGNINCggokM.

Ok, disappearing again.

(Sorry again for being slow to reply to this one.)

"Note that in non-Newcomb-like situations, P(s|do(a)) and P(s|a) yield the same result, see ch. 3.2.2 of Pearl’s Causality."

This is trivially not true.

Is this because I define "Newcomb-ness" via disagreement about the best action between EDT and CDT in the second paragraph? Of course, the d(P(s|do(a)),P(s|a)) could be so small that EDT and CDT agree on what action to take. They could even differ in such a way that CDT-EV and EDT-EV are the same.

But it seems that instead of comparing the argmaxes or the EVs, one could also use the term Newcomb-ness on the basis of the probabilities themselves. Or is there some deeper reason why the sentence is false?

I guess:

(a) p(s | do(a)) is in general not equal to p(s | a). The entire point of causal inference is characterizing that difference.

(b) I looked at section 3.2.2, did not see how anything there supporting the claim.

(c) We knew since the 90s that p(s | do(a)) and p(s | a) disagree on classical decision theory problems, standard smoking lesion being one. But in general on any problem where you shouldn't "manage the news."

So I got super confused and stopped reading.

As cousin_it said somewhere at some point (and I say in my youtube talk), the confusing part of Newcomb is representing the situation correctly, and that is something you can solve by playing with graphs, essentially.

So, the class of situations in which p(s | do(a)) = p(s | a) that I was alluding to is the one in which A has only outgoing arrows (or all the values of A’s predecessors are known). (I guess this could be generalized to: p(s | do(a)) = p(s | a) if A d-separates its predecessors from S?) (Presumably this stuff follows from Rule 2 of Theorem 3.4.1 in Causality.)

All problems in which you intervene in an isolated system from the outside are of this kind and so EDT and CDT make the same recommendations for intervening in a system from the outside. (That’s similar to the point that Pearl makes in section 3.2.2 of Causality: You can model the do-interventions by adding action nodes without predecessors and conditioning on these action nodes.)

The Smoking lesion is an example of a Newcomb-like problem where A has an inbound arrow that leads p(s | do(a)) and p(s | a) to differ. (That said, I think the smoking lesion does not actually work as a Newcomb-like problem, see e.g. chapter 4 of Arif Ahmed’s Evidence, Decision and Causality.)

Similarly, you could model Newcomb’s problem by introducing a logical node as a predecessor of your decision and the result of the prediction. (If you locate “yourself” in the logical node and the logical node does not have any predecessors, then CDT and EDT agree again.)

Of course, in the real world, all problems are in theory Newcomb-like because there are always some ingoing arrows into your decision. But in practice, most problems are nearly non-Newomb-like because, although there may be an unblocked path from my action to the value of my utility function, that path is usually too long/complicated to be useful. E.g., if I raise my hand now, that would mean that the state of the world 1 year ago was such that I raise my hand now. And the world state 1 year ago causes how much utility I have. But unless I’m in Arif Ahmed’s “Betting on the Past”, I don’t know which class of world states 1 year ago (the ones that lead to me raising my hand or the ones that cause me not to raise my hand) causes me to have more utility. So, EDT couldn't try to exploit that way of changing the past.

I agree that in situations where A only has outgoing arrows, p(s | do(a)) = p(s | a), but this class of situations is not the "Newcomb-like" situations. In particular, classical smoking lesion has a confounder with an incoming arrow into a.

Maybe we just disagree on what "Newcomb-like" means? To me what makes a situation "Newcomb-like" is your decision algorithm influencing the world through something other than your decision (as happens in the Newcomb problem via Omega's prediction). In smoking lesion, this does not happen, your decision algorithm only influences the world via your action, so it's not "Newcomb-like" to me.

I agree that in situations where A only has outgoing arrows, p(s | do(a)) = p(s | a), but this class of situations is not the "Newcomb-like" situations.

What I meant to say is that the situations where A only has outgoing arrows are all not Newcomb-like.

Maybe we just disagree on what "Newcomb-like" means? To me what makes a situation "Newcomb-like" is your decision algorithm influencing the world through something other than your decision (as happens in the Newcomb problem via Omega's prediction). In smoking lesion, this does not happen, your decision algorithm only influences the world via your action, so it's not "Newcomb-like" to me.

Ah, okay. Yes, in that case, it seems to be only a terminological dispute. As I say in the post, I would define Newcomb-like-ness via a disagreement between EDT and CDT which can mean either that they disagree about what the right decision is, or, more naturally, that their probabilities diverge. (In the latter case, the statement you commented on is true by definition and in the former case it is false for the reason I mentioned in my first reply.) So, I would view the Smoking lesion as a Newcomb-like problem (ignoring the tickle defense).

I agree that this is part of what confuses the discussion. This is why I have pointed out in previous discussions that in order to be really considering Newcomb / Smoking Lesion, you have to be honestly more convinced the million is in the box after choosing to take one, than you would have been if you had chosen both. Likewise, you have to be honestly more convinced that you have the lesion, after you choose to smoke, than you would have been if you did not. In practice people would tend not to change their minds about that, and therefore they should smoke and take both boxes.

Some relevant discussion here.

Another piece of evidence is this minor error in section 9.2 of Peterson's An Introduction to Decision Theory:

According to causal decision theory, the probability that you have the gene given that you read Section 9.2 is equal to the probability that you have the gene given that you stop at Section 9.1. (That is, the probability is independent of your decision to read this section.) Hence, it would be a mistake to think that your chances of leading a normal life would have been any higher had you stopped reading at Section 9.1.

With regards to this part:

The action cannot causally affect the state, but somehow taking a1 gives us evidence that we’re in the preferable state s1. That is, P(s1|a1)>P(s1|a2) and u(a1,s1)>u(a2,s2).

I'm actually unsure if CDT-theorists take this as true. If you're only looking at the causal links between your actions, P(s1|a1) and P(s1|a2) are actually unknown to you. In which case, if you're deciding under uncertainty about probabilities, so you strive to just maximize payoff. (I think this is roughly correct?)

I think the reason why many people think one should go to the doctor might be that while asserting P(s1|a1,K) > P(s1|a2,K), they don’t upshift the probability of being sick when they sit in the waiting room.

Does s1 refer to the state of being sick, a1 to going to the doctor, and a2 to not going to the doctor? Also, I think most people are not afraid of going to the doctor? (Unless this is from another decision theory's view)?

I am not actually here, but

"Note that in non-Newcomb-like situations, P(s|do(a)) and P(s|a) yield the same result, see ch. 3.2.2 of Pearl’s Causality."

This is trivially not true.

CDT solves Newcomb just fine with the right graph: https://www.youtube.com/watch?v=rGNINCggokM.

Ok, disappearing again.

(Sorry again for being slow to reply to this one.)

"Note that in non-Newcomb-like situations, P(s|do(a)) and P(s|a) yield the same result, see ch. 3.2.2 of Pearl’s Causality."

This is trivially not true.

I guess:

(a) p(s | do(a)) is in general not equal to p(s | a). The entire point of causal inference is characterizing that difference.

(b) I looked at section 3.2.2, did not see how anything there supporting the claim.

So I got super confused and stopped reading.

I agree that in situations where A only has outgoing arrows, p(s | do(a)) = p(s | a), but this class of situations is not the "Newcomb-like" situations.

What I meant to say is that the situations where A only has outgoing arrows are all not Newcomb-like.

Maybe we just disagree on what "Newcomb-like" means? To me what makes a situation "Newcomb-like" is your decision algorithm influencing the world through something other than your decision (as happens in the Newcomb problem via Omega's prediction). In smoking lesion, this does not happen, your decision algorithm only influences the world via your action, so it's not "Newcomb-like" to me.

Some relevant discussion here.

Another piece of evidence is this minor error in section 9.2 of Peterson's An Introduction to Decision Theory:

According to causal decision theory, the probability that you have the gene given that you read Section 9.2 is equal to the probability that you have the gene given that you stop at Section 9.1. (That is, the probability is independent of your decision to read this section.) Hence, it would be a mistake to think that your chances of leading a normal life would have been any higher had you stopped reading at Section 9.1.

With regards to this part:

The action cannot causally affect the state, but somehow taking a1 gives us evidence that we’re in the preferable state s1. That is, P(s1|a1)>P(s1|a2) and u(a1,s1)>u(a2,s2).

I think the reason why many people think one should go to the doctor might be that while asserting P(s1|a1,K) > P(s1|a2,K), they don’t upshift the probability of being sick when they sit in the waiting room.

7

Are causal decision theorists trying to outsmart conditional probabilities?

7

Acknowledgment

7

7