An Intuitive Introduction to Causal Decision Theory

Like any decision theory, Causal Decision Theory (CDT) aims to maximize expected utility; it does this by looking at the causal effects each available action in a problem has. For example, in Problem 1, taking box A has the causal effect of earning $100, whereas taking box B causes you to earn $500. $500 is more than $100, so CDT says to take box B (like any decision theory worth anything should). Similarly, CDT advices to take box A in Problem 2.

CDT's rule of looking at an action's causal effects make sense: if you're deciding which action to take, you want to know how your actions change the environment. And as we will see later, CDT correctly solves the problem of the Smoking Lesion. But first, we have to ask ourselves: what is causality?

What is causality?

A formal description of causality is beyond the purpose of this post (and sequence), but intuitively speaking, causality is about stuff that makes stuff happen. If I throw a glass vase on concrete, it will break; my action of throwing the vase caused it to break.

You may have heard that correlation doesn't necessarily imply causality, which is true. For example, I'd bet hand size and foot size in humans strongly correlate: if we'd measure the hands and feet of a million people, those with larger hands will - on average - have larger feet as well, and vice versa. But hopefully we can agree hand size doesn't have a causal effect on foot size, or vice versa: your hands aren't large or small because your feet are large or small, even though we might be able to quite accurately predict your foot size using your hand size. Rather, hand size and foot size have common causes like genetics (determining how large a person can grow) and quality and quantity of food taken, etc.

Eliezer Yudkowsky describes causality in a the following very neat way:

There's causality anywhere there's a noun, a verb, and a subject.

"I broke the vase" and "John kicks the ball" are both examples of this.

With the hope the reader now has an intuitive notion of causality, we can move on to see how CDT handles Smoking Lesion.

Smoking Lesion

An agent is debating whether or not to smoke. She knows that smoking is correlated with an invariably fatal variety of lung cancer, but the correlation is (in this imaginary world) entirely due to a common cause: an arterial lesion that causes those afflicted with it to love smoking and also (99% of the time) causes them to develop lung cancer. There is no direct causal link between smoking and lung cancer. Agents without this lesion contract lung cancer only 1% of the time, and an agent can neither directly observe nor control whether she suffers from the lesion. The agent gains utility equivalent to $1,000 by smoking (regardless of whether she dies soon), and gains utility equivalent to $1,000,000 if she doesn’t die of cancer. Should she smoke, or refrain?

CDT says "yes". The agent either gets lung cancer or not; having the lesion certainly increases the risk, but smoking doesn't causally affect whether or not the agent has the lesion and has no direct causal effect on her probability of getting lung cancer either. CDT therefore reasons that whether you get the $1,000,000 in utility is beyond your control, but smoking simply gets you $1,000 more than not smoking. While smokers in this hypothetical world more often get lung cancer than non-smokers, this is because there are relatively more smokers in that part of the population that has the lesion, which is the cause of lung cancer. Smoking or not doesn't change whether the agent is in that part of the population; CDT therefore (correctly) says the agent should smoke. The Smoking Lesion situation is actually similar to the hands and feet example above: where e.g. genetics cause people to have larger hands and feet, the Smoking Lesion causes people to have cancer and enjoy smoking.

CDT makes intuitive sense, and seems to solve problems correctly so far. However, it does have a major flaw, which will become apparent in Newcomb's Problem.

Newcomb's Problem

A superintelligence from another galaxy, whom we shall call Omega, comes to Earth and sets about playing a strange little game. In this game, Omega selects a human being, sets down two boxes in front of them, and flies away.

Box A is transparent and contains a thousand dollars.
Box B is opaque, and contains either a million dollars, or nothing.

You can take both boxes, or take only box B.

And the twist is that Omega has put a million dollars in box B iff Omega has predicted that you will take only box B.

Omega has been correct on each of 100 observed occasions so far - everyone who took both boxes has found box B empty and received only a thousand dollars; everyone who took only box B has found B containing a million dollars. (We assume that box A vanishes in a puff of smoke if you take only box B; no one else can take box A afterward.)

Before you make your choice, Omega has flown off and moved on to its next game. Box B is already empty or already full.

Omega drops two boxes on the ground in front of you and flies off.

Do you take both boxes, or only box B?

(Note that "iff" means "if and only if.)

How does CDT approach this problem? Well, let's look at the causal effects of taking both boxes ("two-boxing") and taking one box ("one-boxing").

First of all, note that Omega has already made its prediction. Your action now doesn't causally affect this, as you can't cause the past. Omega made its prediction and based upon it either filled box B or not. If box B isn't filled, one-boxing gives you nothing; two-boxing, however, would give you the contents of box A, earning you $1,000. If box B is filled, one-boxing gets you $1,000,000. That's pretty sweet, but two-boxing gets you $1,000,000 + $1,000 = $1,001,000. In both cases, two-boxing beats one-boxing by $1,000. CDT therefore two-boxes.

John, who is convinced by CDT-style reasoning, takes both boxes. Omega predicted he would, so John only gets $1,000. Had he one-boxed, Omega would have predicted that, giving him $1,000,000. If only he hadn't followed CDT's advice!

Is Omega even possible?

At this point, you may be wondering whether Newcomb's Problem is relevant: is it even possible to make such accurate predictions of someone's decision? There are two important points to note here.

First, yes, such accurate predictions might actually be possible, especially if you're a robot: Omega could then have a copy - a model - of your decision-making software, which it feeds Newcomb's Problem to see whether the model will one-box or two-box. Based on that, Omega predicts whether you will one-box or two-box, and fixes the contents of box B accordingly. Now, you're not a robot, but future brain-scanning techniques might still make it possible to form an accurate model of your decision procedure.

The second point to make here is that predictions need not be this accurate in order to have a problem like Newcomb's. If Omega could predict your action with only 60% accuracy (meaning its prediction is wrong 40% of the time), e.g. by giving you some tests first and examine the answers, the problem doesn't fundamentally change. CDT would still two-box: given Omega's prediction (whatever its accuracy is), two-boxing still earns you $1,000 more than one-boxing. But, of course, Omega's prediction is connected to your decision: two-boxing gives you 0.6 probability of earning $1,000 (because Omega would have predicted you'd two-box with 0.6 accuracy) and 0.4 probability of getting $1,001,000 (the case where Omega is wrong in its prediction), whereas one-boxing would give you 0.6 probability of getting $1,000,000 and 0.4 probability of $0. This means two-boxing has an expected utility of 0.6 x $1,000 + 0.4 x $1,001,000 = $401,000, whereas the expected utility of one-boxing is 0.6 x $1,000,000 + 0.4 x $0 = $600,000. One-boxing still wins, and CDT still goes wrong.

In fact, people's microexpressions on their faces can give clues about what they will decide, making many real-life problems Newcomblike.

Newcomb's Problem vs. Smoking Lesion

You might be wondering about the exact difference between Newcomb's Problem and Smoking Lesion: why does the author suggest to smoke on Smoking Lesion, while also saying one-boxing on Newcomb's Problem is the better choice? After all, two-boxers indeed often find an empty box in Newcomb's Problem - but isn't it also true that smokers often get lung cancer in Smoking Lesion?

Yes. But the latter has nothing to do with the decision to smoke, whereas the former has everything to do with the decision to two-box. Let's indeed assume Omega has a model of your decision procedure in order to make its prediction. Then whatever you decide, the model also decided (with perhaps a small error rate). This isn't different than two calculators both returning "4" on "2 + 2": if your calculator outputs "4" on "2 + 2", you know that, when Fiona input "2 + 2" on her calculator a day earlier, hers must have output "4" as well. It's the same in Newcomb's Problem: if you decide to two-box, so did Omega's model of your decision procedure; similarly, if you decide to one-box, so did the model. Two-boxing then systematically leads to earning only $1,000, while one-boxing gets you $1,000,000. Your decision procedure is instantiated in two places: in your head and in Omega's, and you can't act as if your decision has no impact on Omega's prediction.

In Smoking Lesion, smokers do often get lung cancer, but that's "just" a statistical relation. Your decision procedure has no effect on the presence of the lesion and whether or not you get lung cancer; this lesion does give people a fondness of smoking, but the decision to smoke is still theirs and has no effect on getting lung cancer.

Note that, if we assume Omega doesn't have a model of your decision procedure, two-boxing would be the better choice. For example, if, historically, people wearing brown shoes always one-boxed, Omega might base its prediction on that instead of on a model of your decision procedure. In that case, your decision doesn't have an effect on Omega's prediction, in which case two-boxing simply makes you $1,000 more than one-boxing.

Conclusion

So it turns out CDT doesn't solve every problem correctly. In the next post, we will take a look at another decision theory: Evidential Decision Theory, and how it approaches Newcomb's Problem.

New to LessWrong?

Getting Started

FAQ

Library

^{^}

I'm using first-person here to make the distinction a little clearer, as 'itself' is ambiguous as to if it refers to the agent or Omega. Alternatively: I submit an agent A that runs Omega on A and then agent A does whatever Omega said agent A wouldn't do.

^{^}

Or, in the probabilistic case, I do X if and only if Omega predicts a <50% probability of me doing X.

^{^}

...mostly. E.g. halting oracles don't exist, but are sometimes useful as steppingstones for proofs.

^{^}

Among other issues, parsing the question itself may be impossible...

^{^}

In particular, I suspect the failure mode of many Omegas would be to go into an infinite loop.

^{^}

This is a slightly different proof than the 'standard' proof of the Halting Problem^[3], but this proof also works.

^{^}

The standard proof starts with 'assume you have a TM that solves the Halting Problem', and directly shows that said TM is a contradiction. This proof starts with 'assume you have an algorithm that produces a TM that solves the Halting Problem', and shows that said algorithm is a contradiction.

^{^}

Decision theory1

Frontpage

3

Basic Concepts in Decision Theory

1 comments3 karma

An Intuitive Introduction to Evidential Decision Theory

No comments5 karma

Mentioned in

19An Intuitive Introduction to Functional Decision Theory

5An Intuitive Introduction to Evidential Decision Theory

An Intuitive Introduction to Causal Decision Theory

2TLW

1Heighn

2TLW

New Comment

3 comments, sorted by

top scoring

Click to highlight new comments since: Today at 2:09 PM

[-]TLW3y20

Newcomb's Problem

Several issues, all of which I have highlighted elsewhere and I'll highlight here too:

Omega cannot exist without being super-Turing, assuming agents are Turing-complete.
1. If you have an algorithm for producing an Omega, then I run said algorithm and run Omega on myself and do whatever Omega predicts I won't do^[1]^[2].
  1. This is a contradiction, hence either:
    1. A: If Omega cannot exist, there's no sense spending time on this^[3].
    2. B: If Omega is super-Turing, then all bets are off.
    3. C: If the agent is not Turing-complete, see below.
2. If agents aren't Turing-complete, then CDT does not apply I don't think.
  1. (You may be able to come up with agents for specific problems, but you cannot come up with an agent that applies CDT to all encountered problems^[4].)
"reasonably accurate predictor" hides some subtlety here:
1. Does this mean that "Omega-lite" predicts all agents correctly >50% of the time?
2. Or does this mean it can e.g. predict 80% of agents 80% of the time?
3. If the former, 1 still holds. "Omega-lite" cannot have a prediction accuracy of better than a coinflip for my "run Omega and do the opposite of whatever it says" counterexample without falling into options A-C.
4. If the latter, there's no guarantee that "Omega-lite" has any accuracy greater than a coinflip for any specific agent, and there's no paradox as a result.
5. (There are also a few other subtleties. Omega-lite cannot predict itself (and do computation on the result), or else you get the same contradiction with Omega-lite predicting itself and doing whatever it thinks it won't do. Ditto, you end up with similar issues if >1 Omega-lite can exist and predict other Omega-lites.)

...I should really write this up and turn into a proper post at some point, as this isn't the first post I've seen that ignores these issues.

^{^}
I'm using first-person here to make the distinction a little clearer, as 'itself' is ambiguous as to if it refers to the agent or Omega. Alternatively: I submit an agent A that runs Omega on A and then agent A does whatever Omega said agent A wouldn't do.
^{^}
Or, in the probabilistic case, I do X if and only if Omega predicts a <50% probability of me doing X.
^{^}
...mostly. E.g. halting oracles don't exist, but are sometimes useful as steppingstones for proofs.
^{^}
Among other issues, parsing the question itself may be impossible...

[-]Heighn3y10

Thanks for your comment!

If you have an algorithm for producing an Omega, then I run said algorithm and run Omega on myself and do whatever Omega predicts I won't do

So then my decision procedure simulates Omega, who simulates my decision procedure. "My decision procedure" is the same decision procedure in both cases, so I think it's impossible for me to do what Omega predicts I won't do. Or is this where Omega becomes super-Turing? My knowledge on that is limited.

[-]TLW3y20

So then my decision procedure simulates Omega, who simulates my decision procedure.

Yep. The problem is the unbounded recursion here, give or take. A reasons about B reasoning about A reasoning about B reasoning about A ad infinitum.

I think it's impossible for me to do what Omega predicts I won't do.

Hence, there's a contradiction. Assuming the Oracle side is correct, it's fairly straightforward to show that the agent is also correct... and that this leads to a contradiction. Hence, the Oracle side cannot be correct^[1].

It's easiest to see via analogy to the Halting problem^[2].

~~Omega~~ Halting Problem Solver -> a decidable algorithm that predicts ~~what an arbitrary agent does~~ if an arbitrary Turing Machine halts
If you have a decidable algorithm for producing ~~an Omega~~ a Halting Problem Solver, then I run said algorithm and run said resulting ~~Omega~~ Halting Problem Solver on myself and do whatever said ~~Omega~~ Halting Problem Solver predicts I won't do (~~take both boxes~~ go into an infinite loop if and only if said ~~Omega~~ Halting Problem Solver predicts I won't)
This is a contradiction, therefore an algorithm for producing an Omega ~~a Halting Problem Solver~~ cannot exist.

Or, in the slightly more straightforward 'standard' proof form:

~~Omega~~ Halting Problem Solver -> a decidable algorithm that predicts ~~what an arbitrary agent does~~ if an arbitrary Turing Machine halts
If you have ~~an Omega~~ a Halting Problem Solver, then I run said ~~Omega~~ Halting Problem Solver on myself and do whatever said ~~Omega~~ Halting Problem Solver predicts I won't do (~~take both boxes~~ go into an infinite loop if and only if said ~~Omega~~ Halting Problem Solver predicts I won't)
This is a contradiction, therefore an Omega ~~a Halting Problem Solver~~ cannot exist.

Or is this where Omega becomes super-Turing?

This contradiction only applies if the agent can simulate Omega. (The premise requires that Omega can simulate the agent.)

One way of avoiding this contradiction is if the agent is not Turing-complete, and Omega can simulate the agent but not vice versa.

If the agent is Turing-complete, this implies that Omega must be Turing-complete in order for Omega to simulate the agent. By the Church-Turing thesis, if both are computable this in turn implies that Omega and the agent can simulate each other, and we lead to this contradiction. Which means that if the agent is Turing-complete, Omega must not be computable. So the other possibility is the agent is Turing-complete and that Omega isn't computable (and is e.g. an Oracle Machine). This is where Omega becomes super-Turing.

(A third possibility is that Omega is allowed to be semidecidable... but in this case if I'm an agent that will result in Omega infinite-looping it shouldn't have been able to ask the question in the first place.)

^{^}
In particular, I suspect the failure mode of many Omegas would be to go into an infinite loop.
^{^}
This is a slightly different proof than the 'standard' proof of the Halting Problem^[3], but this proof also works.
^{^}
The standard proof starts with 'assume you have a TM that solves the Halting Problem', and directly shows that said TM is a contradiction. This proof starts with 'assume you have an algorithm that produces a TM that solves the Halting Problem', and shows that said algorithm is a contradiction.

Moderation Log

Curated and popular this week

375Playing in the Creek

Hastings

277Orienting Toward Wizard Power

johnswentworth

253Interpretability Will Not Reliably Find Deceptive AI

Neel Nanda