I suspect that CDT seems not suitable for Newcomb-like problems because it tends to be applied to non-existent outcomes. If the outcome is not in the domain, you should not be calculating its utility. In the PD example CD and DC are not valid outcomes for clones. Similarly, two-boxing and getting $1001k is not a valid outcome in Newcomb. If you prune the decision tree of imaginable but non-existing branches before applying a decision theory, many differences between CDT and EDT tend to go away.
Moreover, if you prune the decision tree of all branches bar one then all decision algorithms will give the same (correct) answer!
It's totally OK to add a notion of pruning in, but you can't really say that your decision algorithm of "CDT with pruning" makes sense unless you can specify which branches ought to be pruned, and which ones should not. Also, outright pruning will often not work; you may only be able to rule out a branch as highly improbable rather than altogether impossible.
In other words, "pruning" as you put it is simply the same thing as "recognizing logical connections" in the sense that So8res used in the above post.
This discussion and a previous conversation with Nate have helped me crystallize my thoughts on why I prefer CDT to any of the attempts to "fix" it using timelessness. Most of the material on TDT/UDT is too technical for me, so it is entirely possible that I am wrong; if there are errors in my reasoning, I would be very grateful if someone could point it out:
Any decision theory depends on the concept of choice: If there is no choice, there is no need for a decision theory. I have seen a quote attributed to Pearl to the effect that we can only ...
This account of decision theory/game theory seems like unnecessary formalism. The single play-through token problem prisoners dilemma has 4 outcomes to consider and between those 4 outcomes, no matter what your opponent does, your outcome is better if you betray so you betray.
The difference with the mirror token problem is that you set it up so that there are explicitly only two outcomes to consider. you betray and get 100 dollars or you cooperate and get 200 dollars. Obviously you choose the one that has the higher expected value os you get 200 dollars. ...
Note that both CDT and EDT have the different problem that they fail on the Transparent Newcomb's problem, due to updating on their observations. This is sort of the converse problem to the one described in the post - falsely eliminating seemingly impossible outcomes that are actually reachable depending on your strategy.
Yet CDT would have you evaluate an action by considering what happens if you replace the node Give? with a function that always returns that action. But this intervention does not affect the opponent, which reasons the same way!
This doesn't seem obvious to me. It looks to me like the graph is faulty; there should be "TheirDecision" and "YourDecision" nodes, each successors of the Give? node, such that changing the Give? node causally affects both. The Give? node is a function; the decision nodes are its outputs. With the graph structured like that, you get sane behavior.
Yes. I agree that CDT fails to achieve optimal results in circumstances where the program that you are running directly affects the outside universe. For example, in clone PD, where running a program causes the opponent to run the same program, or in Newcomb's problem, where running a program that 2-boxes causes the second box to be empty. On the other hand, ANY decision theory can be made to fail in such circumstances. You could merely face a universe that determines whether you are running program X and charges you $100 if you are.
Are there circumstances where the universe does not read your mind where CDT fails?
Are there circumstances where the universe does not read your mind where CDT fails?
I'm sure we could think of some, but I want to address the question of "universe reads your mind". Social agents (ie: real, live people) reason about each-other's minds all the time. There is absolutely nothing weird or unusual about this, and there really oughtn't be anything weird about trying to formalize how it ought be done.
On this topic, I'd like to suggest a variant of Newcomb's problem that I don't recall seeing anywhere in LessWrong (or anywhere else). As usual, Omega presents you with two boxes, box A and box B. She says "You may take either box A or both boxes. Box B contains 1,000$. Box A either contains 1,000,000$ or is empty. Here is how I decided what to put in box A: I consider a perfectly rational agent being put in an identical situation to the one you're in. If I predict she takes one box I put the money in box A, otherwise I put nothing." Suppose furt...
CDT is the academic standard decision theory. Economics, statistics, and philosophy all assume (or, indeed, define) that rational reasoners use causal decision theory to choose between available actions.
Reference?
I'm under the impression that Expected (Von Neumann–Morgenstern) Utility Maximization, aka Evidential Decision Theory is generally considered the ideal theory, while CDT was originally considered as an approximation used to make the computation tractable.
In 1981 Gibbard, Harper and Lewis started to argue that CDT was superior to Expected Utilit...
However, CDT does fail on a very similar problem where it seems insane to fail. CDT fails at the token trade even when it knows it is playing against a perfect copy of itself.
I don't see this as a failure. As I'm sure you know, if the game still is a PD then regardless of what my clone does I'm still always better off defecting. If either my actions influence what my clone does, or I care about my clone's wealth then the game might no longer be a PD and as simple game theory would predict I might not defect.
Imagine the following game: You give me a program that must either output "give" or "keep". I play it in a token trade (as defined above) against itself. I give you the money from only the first instance (you don't get the wealth that goes to the second instance).
Would you be willing to pay me $150 to play this game? (I'd be happy to pay you $150 to give me this opportunity, for the same reason that I cooperate with myself on a one-shot PD.)
regardless of what my clone does I'm still always better off defecting.
This is broken counterfactual reasoning. It assumes that your action is independent of your clone's just because your action does not influence your clone's. According to the definition of the game, the clone will happen to defect if you defect and will happen to cooperate if you cooperate. If you respect this logical dependence when constructing your counterfactuals, you'll realize that reasoning like "regardless of what my clone does I..." neglects the fact that you can't defect while your clone cooperates.
This is crossposted from my new blog. I was planning to write a short post explaining how Newcomblike problems are the norm and why any sufficiently powerful intelligence built to use causal decision theory would self-modify to stop using causal decision theory in short order. Turns out it's not such a short topic, and it's turning into a short intro to decision theory.
I've been motivating MIRI's technical agenda (decision theory and otherwise) to outsiders quite frequently recently, and I received a few comments of the form "Oh cool, I've seen lots of decision theory type stuff on LessWrong, but I hadn't understood the connection." While the intended audience of my blog is wider than the readerbase of LW (and thus, the tone might seem off and the content a bit basic), I've updated towards these posts being useful here. I also hope that some of you will correct my mistakes!
This sequence will probably run for four or five posts, during which I'll motivate the use of decision theory, the problems with the modern standard of decision theory (CDT), and some of the reasons why these problems are an FAI concern.
I'll be giving a talk on the material from this sequence at Purdue next week.
1
Choice is a crucial component of reasoning. Given a set of available actions, which action do you take? Do you go out to the movies or stay in with a book? Do you capture the bishop or fork the king? Somehow, we must reason about our options and choose the best one.
Of course, we humans don't consciously weigh all of our actions. Many of our choices are made subconsciously. (Which letter will I type next? When will I get a drink of water?) Yet even if the choices are made by subconscious heuristics, they must be made somehow.
In practice, decisions are often made on autopilot. We don't weigh every available alternative when it's time to prepare for work in the morning, we just pattern-match the situation and carry out some routine. This is a shortcut that saves time and cognitive energy. Yet, no matter how much we stick to routines, we still spend some of our time making hard choices, weighing alternatives, and predicting which available action will serve us best.
The study of how to make these sorts of decisions is known as Decision Theory. This field of research is closely intertwined with Economics, Philosophy, Mathematics, and (of course) Game Theory. It will be the subject of today's post.
2
Decisions about what action to choose necessarily involve counterfactual reasoning, in the sense that we reason about what would happen if we took actions which we (in fact) will not take.
We all have some way of performing this counterfactual reasoning. Most of us can visualize what would happen if we did something that we aren't going to do. For example, consider shouting "PUPPIES!" at the top of your lungs right now. I bet you won't do it, but I also bet that you can picture the results.
One of the major goals of decision theory is to formalize this counterfactual reasoning: if we had unlimited resources then how would we compute alternatives so as to ensure that we always pick the best possible action? This question is harder than it looks, for reasons explored below: counterfactal reasoning can encounter many pitfalls.
A second major goal of decision theory is this: human counterfactual reasoning sometimes runs afoul of those pitfalls, and a formal understanding of decision theory can help humans make better decisions. It's no coincidence that Game Theory was developed during the cold war!
(My major goal in decision theory is to understand it as part of the process of learning how to construct a machine intelligence that reliably reasons well. This provides the motivation to nitpick existing decision theories. If they're broken then we had better learn that sooner rather than later.)
3
Sometimes, it's easy to choose the best available action. You consider each action in turn, and predict the outcome, and then pick the action that leads to the best outcome. This can be difficult when accurate predictions are unavailable, but that's not the problem that we address with decision theory. The problem we address is that sometimes it is difficult to reason about what would happen if you took a given action.
For example, imagine that you know of a fortune teller who can reliably read palms to divine destinies. Most people who get a good fortune wind up happy, while most people who get a bad fortune wind up sad. It's been experimentally verified that she can use information on palms to reliably make inferences about the palm-owner's destiny.
So... should you get palm surgery to change your fate?
If you're bad at reasoning about counterfactuals, you might reason as follows:
Now admittedly, if palm reading is shown to work, the first thing you should do is check whether you can alter destiny by altering your palms. However, assuming that changing your palm doesn't drastically affect your fate, this sort of reasoning is quite silly.
The above reasoning process conflates correlation with causal control. The above reasoner gets palm surgery because they want a good destiny. But while your palm may give information about their destiny, your palm does not control your fate.
If we find out that we've been using this sort of reasoning, we can usually do better by considering actions only on the basis of what they cause.
4
This idea leads us to causal decision theory (CDT), which demands that we consider actions based only upon the causal effects of those actions.
Actions are considered using causal counterfactual reasoning. Though causal counterfactual reasoning can be formalized in many ways, we will consider graphical models specifically. Roughly, a causal graph is a graph where the world model is divided into a series of nodes, with arrows signifying the causal connections between the nodes. For a more formal introduction, you'll need to consult a textbook. As an example, here's a causal graph for the palm-reading scenario above:
The choice is denoted by the dotted
Surgery?
node. Your payoff is the$
diamond. Each node is specified as a function of all nodes causing it.For example, in a very simple deterministic version of the palm-reading scenario, the nodes could be specified as follows:
Surgery?
is a program implementing the agent, and must output either yes or no.Destiny
is either good or bad.Fortune
is always good ifSurgery?
is yes, and is the same asDestiny
otherwise.$
is $100 ifDestiny
is good and $10 otherwise, minus $10 ifSurgery
is yes. Surgery is expensive!Now let's say that you expect even odds on whether or not your destiny is good or bad, e.g. the probability that
Destiny
=good is 50%.If the
Surgery?
node is a program that implements causal decision theory, then that program will choose between yes and no using the following reasoning:Surgery?
$
Surgery?
with a function that always returns yes$
Destiny
=goodDestiny
=badSurgery?
with a function that always returns no$
Destiny
=goodDestiny
=bad$
.More generally, the CDT reasoning procedure works as follows:
Notice how CDT evaluates counterfactuals by setting the value of its action node in a causal graph, and then calculating its payoff accordingly. Done correctly, this allows a reasoner to figure out the causal implications of taking a specific action without getting confused by additional variables.
CDT is the academic standard decision theory. Economics, statistics, and philosophy all assume (or, indeed, define) that rational reasoners use causal decision theory to choose between available actions.
Furthermore, narrow AI systems which consider their options using this sort of causal counterfactual reasoning are implicitly acting like they use causal decision theory.
Unfortunately, causal decision theory is broken.
5
Before we dive into the problems with CDT, let's flesh it out a bit more. Game Theorists often talk about scenarios in terms of tables that list the payoffs associated with each action. This might seem a little bit like cheating, because it often takes a lot of hard work to determine what the payoff of any given action is. However, these tables will allow us to explore some simple examples of how causal reasoning works.
I will describe a variant of the classic Prisoner's Dilemma which I refer to as the token trade. There are two players in two separate rooms, one red and one green. The red player starts with the green token, and vice versa. Each must decide (in isolation, without communication) whether or not to give their token to me, in which case I will give it to the other player.
Afterwards, they may cash their token out. The red player gets $200 for cashing out the red token and $100 for the green token, and vice versa. The payoff table looks like this:
For example, if the green player gives the red token away, and the red player keeps the green token, then the red player gets $300 while the green player gets nothing.
Now imagine a causal decision theorist facing this scenario. Their causal graph might look something like this:
Let's evaluate this using CDT. The action node is
Give?
, the payoff node is$
. We must evaluate the expectation of$
givenGive?
=yes andGive?
=no. This, of course, depends upon the expected value ofTheirDecision
.In Game Theory, we usually assume that the opponent is reasoning using something like CDT. Then we can reason about
TheirDecision
given that they are doing similar reasoning about our decision and so on. This threatens to lead to infinite regress, but in fact there are some tricks you can use to guarantee at least one equilibrium. (These are the famous Nash equilibria.) This sort of reasoning requires some modifications to the simple CDT procedure given above which we're going to ignore today. Because while most scenarios with multiple agents require more complicated reasoning, the token trade is an especially simple scenario where we can brush all that under the rug.In the token trade, the expected value of
TheirDecision
doesn't matter to a CDT agent. No matter what the probability p ofTheirDecision
=give happens to be, the CDT agent will do the following reasoning:Give?
to be a constant function returning yesTheirDecision
=give then we get $200TheirDecision
=keep then we get $0Give?
to be a constant function returning noTheirDecision
=give then we get $300TheirDecision
=keep then we get $100Obviously, 300p+100(1-p) will be larger than 200p, no matter what probability p is.
A CDT agent in the token trade must have an expectation about
TheirDecision
captured by a probability p that they will give their token, and we have just shown that no matter what that p is, the CDT agent will keep their token.When something like this occurs (where
Give?
=no is better regardless of the value ofTheirDecision
) we say thatGive?
=no is a "dominant strategy". CDT executes this dominant strategy, and keeps its token.6
Of course, this means that each player will get $100, when they could have both recieved $200. This may seem unsatisfactory. Both players would agree that they could do better by trading tokens. Why don't they coordinate?
The classic response is that the token trade (better known as the Prisoner's Dilemma) is a game that explicitly disallows coordination. If players do have an opportunity to coordinate (or even if they expect to play the game mulitple times) then they can (and do!) do better than this.
I won't object much here, except to note that this answer is still unsatisfactory. CDT agents fail to cooperate on a one-shot Prisoner's Dilemma. That's a bullet that causal decision theorists willingly bite, but don't forget that it's still a bullet.
7
Failure to cooperate on the one-shot Prisoner's Dilemma is not necessarily a problem. Indeed, if you ever find yourself playing a token trade against an opponent using CDT, then you had better hold on to your token, because they surely aren't going to give you theirs.
However, CDT does fail on a very similar problem where it seems insane to fail. CDT fails at the token trade even when it knows it is playing against a perfect copy of itself.
I call this the "mirror token trade", and it works as follows: first, I clone you. Then, I make you play a token trade against yourself.
In this case, your opponent is guaranteed to pick exactly the same action that you pick. (Well, mostly: the game isn't completely symmetric. If you want to nitpick, consider that instead of playing against a copy of yourself, you must write a red/green colorblind deterministic computer program which will play against a copy of itself.)
The causal graph for this game looks like this:
Because I've swept questions of determinism and asymmetry under the rug, both decisions will be identical. The red copy should trade its token, because that's guaranteed to get it the green token (and it's the only way to do so).
Yet CDT would have you evaluate an action by considering what happens if you replace the node
Give?
with a function that always returns that action. But this intervention does not affect the opponent, which reasons the same way! Just as before, a CDT agent treatsTheirDecision
as if it has some probability of being give that is independent from the agent's action, and reasons that "I always keep my token while they act independently" dominates "I always give my token while they act independently".Do you see the problem here? CDT is evaluating its action by changing the value of its action node
Give?
, assuming that this only affects things that are caused byGive?
. The agent reasons counterfactually by considering "what ifGive?
were a constant function that always returned yes?" while failing to note that overwritingGive?
in this way neglects the fact thatGive?
andTheirDecision
are necessarily equal.Or, to put it another way, CDT evaluates counterfactuals assuming that all nodes uncaused by its action are independent of its action. It thinks it can change its action and only look at the downstream effects. This can break down when there are acausal connections between the nodes.
After the red agent has been created from the template, its decision no longer causally affects the decision of the green agent. But both agents will do the same thing! There is a logical connection, even though there is no causal connection. It is these logical connections that are ignored by causal counterfactual reasoning.
This is a subtle point, but an important one: the values of
Give?
andTheirDecision
are logically connected, but CDT's method of reasoning about counterfactuals neglects this connection.8
This is a know failure mode for causal decision theory. The mirror token trade is an example of what's known as a "Newcomblike problem".
Decision theorists occasionally dismiss Newcomblike problems as edge cases, or as scenarios specifically designed to punish agents for being "rational". I disagree.
And finally, eight sections in, I'm ready to articulate the original point: Newcomblike problems aren't a special case. They're the norm.
But this post has already run on for far too long, so that discussion will have to wait until next time.