A fungibility theorem
Restatement of: If you don't know the name of the game, just tell me what I mean to you. Alternative to: Why you must maximize expected utility. Related to: Harsanyi's Social Aggregation Theorem.
Summary: This article describes a theorem, previously described by Stuart Armstrong, that tells you to maximize the expectation of a linear aggregation of your values. Unlike the von Neumann-Morgenstern theorem, this theorem gives you a reason to behave rationally.1
Why you must maximize expected utility
This post explains von Neumann-Morgenstern (VNM) axioms for decision theory, and what follows from them: that if you have a consistent direction in which you are trying to steer the future, you must be an expected utility maximizer. I'm writing this post in preparation for a sequence on updateless anthropics, but I'm hoping that it will also be independently useful.
The theorems of decision theory say that if you follow certain axioms, then your behavior is described by a utility function. (If you don't know what that means, I'll explain below.) So you should have a utility function! Except, why should you want to follow these axioms in the first place?
A couple of years ago, Eliezer explained how violating one of them can turn you into a money pump — how, at time 11:59, you will want to pay a penny to get option B instead of option A, and then at 12:01, you will want to pay a penny to switch back. Either that, or the game will have ended and the option won't have made a difference.
When I read that post, I was suitably impressed, but not completely convinced: I would certainly not want to behave one way if behaving differently always gave better results. But couldn't you avoid the problem by violating the axiom only in situations where it doesn't give anyone an opportunity to money-pump you? I'm not saying that would be elegant, but is there a reason it would be irrational?
It took me a while, but I have since come around to the view that you really must have a utility function, and really must behave in a way that maximizes the expectation of this function, on pain of stupidity (or at least that there are strong arguments in this direction). But I don't know any source that comes close to explaining the reason, the way I see it; hence, this post.
I'll use the von Neumann-Morgenstern axioms, which assume probability theory as a foundation (unlike the Savage axioms, which actually imply that anyone following them has not only a utility function but also a probability distribution). I will assume that you already accept Bayesianism.
*
Epistemic rationality is about figuring out what's true; instrumental rationality is about steering the future where you want it to go. The way I see it, the axioms of decision theory tell you how to have a consistent direction in which you are trying to steer the future. If my choice at 12:01 depends on whether at 11:59 I had a chance to decide differently, then perhaps I won't ever be money-pumped; but if I want to save as many human lives as possible, and I must decide between different plans that have different probabilities of saving different numbers of people, then it starts to at least seem doubtful that which plan is better at 12:01 could genuinely depend on my opportunity to choose at 11:59.
So how do we formalize the notion of a coherent direction in which you can steer the future?
Problematic Problems for TDT
A key goal of Less Wrong's "advanced" decision theories (like TDT, UDT and ADT) is that they should out-perform standard decision theories (such as CDT) in contexts where another agent has access to the decider's code, or can otherwise predict the decider's behaviour. In particular, agents who run these theories will one-box on Newcomb's problem, and so generally make more money than agents which two-box. Slightly surprisingly, they may well continue to one-box even if the boxes are transparent, and even if the predictor Omega makes occasional errors (a problem due to Gary Drescher, which Eliezer has described as equivalent to "counterfactual mugging"). More generally, these agents behave like a CDT agent will wish it had pre-committed itself to behaving before being faced with the problem.
However, I've recently thought of a class of Omega problems where TDT (and related theories) appears to under-perform compared to CDT. Importantly, these are problems which are "fair" - at least as fair as the original Newcomb problem - because the reward is a function of the agent's actual choices in the problem (namely which box or boxes get picked) and independent of the method that the agent uses to choose, or of its choices on any other problems. This contrasts with clearly "unfair" problems like the following:
Discrimination: Omega presents the usual two boxes. Box A always contains $1000. Box B contains nothing if Omega detects that the agent is running TDT; otherwise it contains $1 million.
So what are some fair "problematic problems"?
Problem 1: Omega (who experience has shown is always truthful) presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT. I won't tell you what the agent decided, but I will tell you that if the agent two-boxed then I put nothing in Box B, whereas if the agent one-boxed then I put $1 million in Box B. Regardless of how the simulated agent decided, I put $1000 in Box A. Now please choose your box or boxes."
Analysis: Any agent who is themselves running TDT will reason as in the standard Newcomb problem. They'll prove that their decision is linked to the simulated agent's, so that if they two-box they'll only win $1000, whereas if they one-box they will win $1 million. So the agent will choose to one-box and win $1 million.
However, any CDT agent can just take both boxes and win $1001000. In fact, any other agent who is not running TDT (e.g. an EDT agent) will be able to re-construct the chain of logic and reason that the simulation one-boxed and so box B contains the $1 million. So any other agent can safely two-box as well.
Note that we can modify the contents of Box A so that it contains anything up to $1 million; the CDT agent (or EDT agent) can in principle win up to twice as much as the TDT agent.
Problem 2: Our ever-reliable Omega now presents ten boxes, numbered from 1 to 10, and announces the following. "Exactly one of these boxes contains $1 million; the others contain nothing. You must take exactly one box to win the money; if you try to take more than one, then you won't be allowed to keep any winnings. Before you entered the room, I ran multiple simulations of this problem as presented to an agent running TDT, and determined the box which the agent was least likely to take. If there were several such boxes tied for equal-lowest probability, then I just selected one of them, the one labelled with the smallest number. I then placed $1 million in the selected box. Please choose your box."
Analysis: A TDT agent will reason that whatever it does, it cannot have more than 10% chance of winning the $1 million. In fact, the TDT agent's best reply is to pick each box with equal probability; after Omega calculates this, it will place the $1 million under box number 1 and the TDT agent has exactly 10% chance of winning it.
But any non-TDT agent (e.g. CDT or EDT) can reason this through as well, and just pick box number 1, so winning $1 million. By increasing the number of boxes, we can ensure that TDT has arbitrarily low chance of winning, compared to CDT which always wins.
Some questions:
1. Have these or similar problems already been discovered by TDT (or UDT) theorists, and if so, is there a known solution? I had a search on Less Wrong but couldn't find anything obviously like them.
2. Is the analysis correct, or is there some subtle reason why a TDT (or UDT) agent would choose differently from described?
3. If a TDT agent believed (or had reason to believe) that Omega was going to present it with such problems, then wouldn't it want to self-modify to CDT? But this seems paradoxical, since the whole idea of a TDT agent is that it doesn't have to self-modify.
4. Might such problems show that there cannot be a single TDT algorithm (or family of provably-linked TDT algorithms) so that when Omega says it is simulating a TDT agent, it is quite ambiguous what it is doing? (This objection would go away if Omega revealed the source-code of its simulated agent, and the source-code of the choosing agent; each particular version of TDT would then be out-performed on a specific matching problem.)
5. Are these really "fair" problems? Is there some intelligible sense in which they are not fair, but Newcomb's problem is fair? It certainly looks like Omega may be "rewarding irrationality" (i.e. giving greater gains to someone who runs an inferior decision theory), but that's exactly the argument that CDT theorists use about Newcomb.
6. Finally, is it more likely that Omegas - or things like them - will present agents with Newcomb and Prisoner's Dilemma problems (on which TDT succeeds) rather than problematic problems (on which it fails)?
Edit: I tweaked the explanation of Box A's contents in Problem 1, since this was causing some confusion. The idea is that, as in the usual Newcomb problem, Box A always contains $1000. Note that Box B depends on what the simulated agent chooses; it doesn't depend on Omega predicting what the actual deciding agent chooses (so Omega doesn't put less money in any box just because it sees that the actual decider is running TDT).
Decision Theories: A Semi-Formal Analysis, Part III
Or: Formalizing Timeless Decision Theory
Previously:
0. Decision Theories: A Less Wrong Primer
1. The Problem with Naive Decision Theory
2. Causal Decision Theory and Substitution
WARNING: The main result of this post, as it's written here, is flawed. I at first thought it was a fatal flaw, but later found a fix. I'm going to try and repair this post, either by including the tricky bits, or by handwaving and pointing you to the actual proofs if you're curious. Carry on!
Summary of Post: Have you ever wanted to know how (and whether) Timeless Decision Theory works? Using the framework from the last two posts, this post shows you explicitly how TDT can be implemented in the context of our tournament, what it does, how it strictly beats CDT on fair problems, and a bit about why this is a Big Deal. But you're seriously going to want to read the previous posts in the sequence before this one.
We've reached the frontier of decision theories, and we're ready at last to write algorithms that achieve mutual cooperation in Prisoner's Dilemma (without risk of being defected on, and without giving up the ability to defect against players who always cooperate)! After two substantial preparatory posts, it feels like it's been a long time, hasn't it?

But look at me, here, talking when there's Science to do...
Decision Theories: A Semi-Formal Analysis, Part II
Or: Causal Decision Theory and Substitution
Previously:
0. Decision Theories: A Less Wrong Primer
1. The Problem with Naive Decision Theory
Summary of Post: We explore the role of substitution in avoiding spurious counterfactuals, introduce an implementation of Causal Decision Theory and a CliqueBot, and set off in the direction of Timeless Decision Theory.
In the last post, we showed the problem with what we termed Naive Decision Theory, which attempts to prove counterfactuals directly and pick the best action: there's a possibility of spurious counterfactuals which lead to terrible decisions. We'll want to implement a decision theory that does better; one that is, by any practical definition of the words, foolproof and incapable of error...

I know you're eager to get to Timeless Decision Theory and the others. I'm sorry, but I'm afraid I can't do that just yet. This background is too important for me to allow you to skip it...
Over the next few posts, we'll create a sequence of decision theories, each of which will outperform the previous ones (the new ones will do better in some games, without doing worse in others0) in a wide range of plausible games.
Decision Theories: A Semi-Formal Analysis, Part I
Or: The Problem with Naive Decision Theory
Previously: Decision Theories: A Less Wrong Primer
Summary of Sequence: In the context of a tournament for computer programs, I give almost-explicit versions of causal, timeless, ambient, updateless, and several other decision theories. I explain the mathematical considerations that make decision theories tricky in general, and end with a bunch of links to the relevant recent research. This sequence is heavier on the math than the primer was, but is meant to be accessible to a fairly general audience. Understanding the basics of game theory (and Nash equilibria) will be essential. Knowing about things like Gödel numbering, quining and Löb's Theorem will help, but won't be required.
Summary of Post: I introduce a context in which we can avoid most of the usual tricky philosophical problems and formalize the decision theories of interest. Then I show the chief issue with what might be called "naive decision theory": the problem of spurious counterfactual reasoning. In future posts, we'll see how other decision theories get around that problem.
In my Decision Theory Primer, I gave an intuitive explanation of decision theories; now I'd like to give a technical explanation. The main difficulty is that in the real world, there are all sorts of complications that are extraneous to the core of decision theory. (I'll mention more of these in the last post, but an obvious one is that we can't be sure that our perception and memory match reality.)
In order to avoid such difficulties, I'll need to demonstrate decision theory in a completely artificial setting: a tournament among computer programs.

Decision Theories: A Less Wrong Primer

Summary: If you've been wondering why people keep going on about decision theory on Less Wrong, I wrote you this post as an answer. I explain what decision theories are, show how Causal Decision Theory works and where it seems to give the wrong answers, introduce (very briefly) some candidates for a more advanced decision theory, and touch on the (possible) connection between decision theory and ethics.
Decision Theory Paradox: PD with Three Implies Chaos?
Prerequisites: Familiarity with decision theories (in particular, Eliezer's Timeless Decision Theory) and of course the Prisoner's Dilemma.
Summary: I show an apparent paradox in a three-agent variant of the Prisoner's Dilemma: despite full knowledge of each others' source codes, TDT agents allow themselves to be exploited by CDT, and lose completely to another simple decision theory. Please read the post and think for yourself about the Exercises and the Problem below before reading the comments; this is an opportunity to become a stronger expert at and on decision theory!
We all know that in a world of one-shot Prisoner's Dilemmas with read-access to the other player's source code, it's good to be Timeless Decision Theory. A TDT agent in a one-shot Prisoner's Dilemma will correctly defect against an agent that always cooperates (call this CooperateBot) or always defects (call this DefectBot, and note that CDT trivially reduces to this agent), and it will cooperate against another TDT agent (or any other type of agent whose decision depends on TDT's decision in the appropriate way). In fact, if we run an evolutionary contest as Robert Axelrod famously did for the Iterated Prisoner's Dilemma, and again allow players to read the other players' source codes, TDT will annihilate both DefectBot and CooperateBot over the long run, and it beats or ties any other decision theory.1 But something interesting happens when we take players in threes...
Towards a New Decision Theory for Parallel Agents
A recent post: Consistently Inconsistent, raises some problems with the unitary view of the mind/brain, and presents the modular view of the mind as an alternate hypothesis. The parallel/modular view of the brain not only deals better with the apparent hypocritical and contradictory ways our desires, behaviors, and believes seem to work, but also makes many successful empirical predictions, as well as postdictions. Much of that work can be found in Dennett's 1991 book: "Consciousness Explained" which details both the empirical evidence against the unitary view, and the intuition-fails involved in retaining a unitary view after being presented with that evidence.
The aim of this post is not to present further evidence in favor of the parallel view, nor to hammer any more nails in the the unitary view's coffin; the scientific and philosophical communities have done well enough in both departments to discard the intuitive hypothesis that there is some executive of the mind keeping things orderly. The dilemma I wish to raise is a question: "How should we update our decision theories to deal with independent, and sometimes inconsistent, desires and believes being had by one agent?"
If we model one agent's desires by using one utility function, and this function orders the outcomes the agent can reach on one real axis, then it seems like we might be falling back into the intuitive view that there is some me in there with one definitive list of preferences. The picture given to us by Marvin Mimsky and Dennett involves a bunch of individually dumb agents, each with a unique set of specialized abilities and desires, interacting in such a way so as to produce one smart agent, with a diverse set of abilities and desires, but the smart agent only apears when viewed from the right level of description. For convenience, we will call those dumb-specialized agents "subagents", and the smart-diverse agent that emerges from their interaction "the smart agent". When one considers what it would be useful for a seeing-neural-unit to want to do, and contrasts it with what it would be useful for a get that food-neural-unit to want to do, e.g., examine that prey longer v.s. charge that prey, turn head v.s. keep running forward, stay attentive v.s. eat that food, etc. it becomes clear that cleverly managing which unit gets to have how much control, and when, is an essential part of the decision making process of the whole. Decision theory, as far as I can tell, does not model any part of that managing process; instead we treat the smart agent as having its own set of desires, and don't discuss how the subagents' goals are being managed to produce that global set of desires.
It is possible that the many subagents in a brain act isomorphically to an agent with one utility function and a unique problem space, when they operate in concert. A trivial example of such an agent might have only two subagents "A" and "B", and possible outcomes O1 through On. We can plot the utilities that each subagent gives to these outcomes on a two dimensional positive Cartesian graph; A's assigned utilities being represented by position in X, and B's utilities by position in Y. The method by which these subagents are managed to produce behavior might just be: go for the possible outcome furthest from (0,0); in, which case, the utility function of the whole agent U(Ox) would just be the distance from (0,0) to (A's U(Ox) , B's U(Ox)).
An agent which manages its subagents so as to be isomorphic to one utility function on one problem space is certainly mathematically describable, but also implausible. It is unlikely that the actual physical-neural subagents in a brain deal with the same problem spaces, i.e., they each have their own unique set of O1 through On. It is not as if all the subagents are playing the same game, but each has a unique goal within that game – they each have their own unique set of legal moves too. This makes it problematic to model the global utility function of the smart agent as assigning one real number to every member of a set of possible outcomes, since there is no one set of possible outcomes for the smart agent as a whole. Each subagent has its own search space with its own format of representation for that problem space. The problem space and utility function of the smart agent are implicit in the interactions of the subagents; they emerge from the interactions of agents on a lower level; the smart agents utility function and problem space are never explicitly written down.
A useful example is smokers that are quitting. Some part of their brains that can do complicated predictions doesn't want its body to smoke. This part of their brain wants to avoid death, i.e., will avoid death if it can, and knows that choosing the possible outcome of smoking puts its body at high risk for death. Another part of their brains wants nicotine, and knows that choosing the move of smoking gets it nicotine. The nicotine craving subagent doesn't want to die, it also doesn't want to stay alive, these outcomes aren't in the domain of the nicotine-subagent's utility function at all. The part of the brain responsible for predicting its bodies death if it continues to smoke, probably isn't significantly rewarded by nicotine in a parallel manner. If a cigarette is around and offered to the smart agent, these subagents must compete for control of the relevant parts of their body, e.g., nicotine-subagent might set off a global craving, while predict-the-future-subagent might set off a vocal response saying "no thanks, I'm quitting." The overall desire to smoke or not smoke of the smart agent is just the result of this competition. Similar examples can be made with different desires, like a desire to over eat and a desire to look slim, or the desire to stay seated and the desire to eat a warm meal.
We may call the algorithm which settles these internal power struggles the "managing algorithm", and we may call a decision theory which models managing algorithms a "parallel decision theory". It's not the businesses of decision theorists to discover the specifics of the human managing process, that's the business of empirical science. But certain parts of the human managing algorithm can be reasonably decided on. It is very unlikely that our managing algorithm is utilitarian for example, i.e., the smart agent doesn't do whatever gets the highest net utility for its subagents. Some subagents are more powerful than others; they have a higher prior chance of success than their competitors; some others are weak in a parallel fashion. The question of what counts as one subagent in the brain is another empirical question which is not the business of decision theorists either, but anything that we do consider a subagent in a parallel theory must solve its problem in the form of a CSA, i.e., it must internally represent its outcomes, know what outcomes it can get to from whatever outcome it is at, and assign a utility to each outcome. There are likely many neural units that fit that description in the brain. Many of them probably contain as parts subsubagnets which also fit this description, but eventually, if you divide the parts enough, you get to neurons which are not CSAs, and thus not subagents.
If we want to understand how we make decisions, we should try to model a CSA, which is made out of more spcialized sub-CSAs competing and agreeing, which are made out of further specialized sub-sub-CSAs competing and agreeing, which are made out of, etc. which are made out of non-CSA algorithms. If we don't understand that, we don't understand how brains make decisions.
I hope that the considerations above are enough to convince reductionists that we should develop a parallel decision theory if we want to reduce decision making to computing. I would like to add an axiomatic parallel decision theory to the LW arsenal, but I know that that is not a one man/woman job. So, if you think you might be of help in that endeavor, and are willing to devote yourself to some degree, please contact me at hastwoarms@gmail.com. Any team we assemble will likely not meet in person often, and will hopefully frequently meet on some private forum. We will need decision theorists, general mathematicians, people intimately familiar with the modular theory of mind, and people familiar with neural modeling. What follows are some suggestions for any team or individual that might pursue that goal independently:
- The specifics of the managing algorithm used in brains are mostly unknown. As such, any parallel decision theory should be built to handle as diverse a range of managing algorithms as possible.
- No composite agent should have any property that is not reducible to the interactions of the agents it is made out of. If you have a complete description of the subagents, and a complete description of the managing algorithm, you have a complete description of the smart agent.
- There is nothing wrong with treating the lowest level of CSAs as black boxes. The specifics of the non-CSA algorithms, which the lowest level CSAs are made out of are not relevant to parallel decision theory.
- Make sure that the theory can handle each subagent having its own unique set of possible outcomes, and its own unique method of representing those outcomes.
- Make sure that each CSA above the lowest level actually has "could", "should", and "would" labels on the nodes in its problem space, and make sure that those labels, their values, and the problem space itself can be reduced to the managing of the CSAs on the level below.
- Each level above the lowest should have CSAs dealing with more a more diverse range of problems than the ones on the level bellow. The lowest level should have the most specialized CSAs.
- If you've achieved the six goals above, try comparing your parallel decision theory to other decision theories; see how much predictive accuracy is gained by using a parallel decision theory instead of the classical theories.
Preference For (Many) Future Worlds
Followup to: Quantum Russian Roulette; The Domain of Your Utility Function
The only way to win is cheat
And lay it down before I'm beat
and to another give my seat
for that's the only painless feat.
Suicide is painless
It brings on many changes
and I can take or leave it if I please.
-- M.A.S.H.
Let us pretend, for the moment, that we are rational Expected Utility Maximisers. We make our decisions with the intention of achieving outcomes that we judge to have high utility. Outcomes that satisfy our preferences. Since developments in physics have led us to abandon the notion of a simple single future world our decision making process must now grapple with the notion that some of our decisions will result in more than one future outcome. Not simply the possibility of more than one future outcome but multiple worlds, each of which with different events occurring. In extreme examples we can consider the possibility of staking our very lives on the toss of a quantum die, figuring that we are going to live in one world anyway!
How do preferences apply when making decisions with Many Worlds? The description I’m giving here will be obvious to the extent of being trivial to some, confusing to others and, I expect, considered outright wrong by others. But it is the post that I want to be able to link to whenever the question “Do you believe in quantum immortality?” comes up. Because it is a wrong question!
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)