Decision Theories: A Semi-Formal Analysis, Part III
Or: Formalizing Timeless Decision Theory
Previously:
0. Decision Theories: A Less Wrong Primer
1. The Problem with Naive Decision Theory
2. Causal Decision Theory and Substitution
WARNING: The main result of this post, as it's written here, is flawed. I at first thought it was a fatal flaw, but later found a fix. I'm going to try and repair this post, either by including the tricky bits, or by handwaving and pointing you to the actual proofs if you're curious. Carry on!
Summary of Post: Have you ever wanted to know how (and whether) Timeless Decision Theory works? Using the framework from the last two posts, this post shows you explicitly how TDT can be implemented in the context of our tournament, what it does, how it strictly beats CDT on fair problems, and a bit about why this is a Big Deal. But you're seriously going to want to read the previous posts in the sequence before this one.
We've reached the frontier of decision theories, and we're ready at last to write algorithms that achieve mutual cooperation in Prisoner's Dilemma (without risk of being defected on, and without giving up the ability to defect against players who always cooperate)! After two substantial preparatory posts, it feels like it's been a long time, hasn't it?

But look at me, here, talking when there's Science to do...
Decision Theories: A Less Wrong Primer

Summary: If you've been wondering why people keep going on about decision theory on Less Wrong, I wrote you this post as an answer. I explain what decision theories are, show how Causal Decision Theory works and where it seems to give the wrong answers, introduce (very briefly) some candidates for a more advanced decision theory, and touch on the (possible) connection between decision theory and ethics.
Decision Theory Paradox: PD with Three Implies Chaos?
Prerequisites: Familiarity with decision theories (in particular, Eliezer's Timeless Decision Theory) and of course the Prisoner's Dilemma.
Summary: I show an apparent paradox in a three-agent variant of the Prisoner's Dilemma: despite full knowledge of each others' source codes, TDT agents allow themselves to be exploited by CDT, and lose completely to another simple decision theory. Please read the post and think for yourself about the Exercises and the Problem below before reading the comments; this is an opportunity to become a stronger expert at and on decision theory!
We all know that in a world of one-shot Prisoner's Dilemmas with read-access to the other player's source code, it's good to be Timeless Decision Theory. A TDT agent in a one-shot Prisoner's Dilemma will correctly defect against an agent that always cooperates (call this CooperateBot) or always defects (call this DefectBot, and note that CDT trivially reduces to this agent), and it will cooperate against another TDT agent (or any other type of agent whose decision depends on TDT's decision in the appropriate way). In fact, if we run an evolutionary contest as Robert Axelrod famously did for the Iterated Prisoner's Dilemma, and again allow players to read the other players' source codes, TDT will annihilate both DefectBot and CooperateBot over the long run, and it beats or ties any other decision theory.1 But something interesting happens when we take players in threes...
How I Lost 100 Pounds Using TDT
Background Information: Ingredients of Timeless Decision Theory
Alternate Approaches Include: Self-empathy as a source of “willpower”, Applied Picoeconomics, Akrasia, hyperbolic discounting, and picoeconomics, Akrasia Tactics Review
Standard Disclaimer: Beware of Other-Optimizing
Timeless Decision Theory (or TDT) allowed me to succeed in gaining control over when and how much I ate in a way that previous attempts at precommitment had repeatedly failed to do. I did so well before I was formally exposed to the concept of TDT, but once I clicked on TDT I understood that I had effectively been using it. That click came from reading Eliezer’s shortest summary of TDT, which was:
The one-sentence version is: Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation
You can find more here but my recommendation at least at first is to stick with the one sentence version. It is as simple as it can be, but no simpler.
Utilizing TDT gave me several key abilities that I previously lacked. The most important was realizing that what I chose now would be the same choice I would make at other times under the same circumstances. This allowed me to compare having the benefits now to paying the costs now, as opposed to paying costs now for future benefits later. This ability allowed me to overcome hyperbolic discounting. The other key ability was that it freed me from the need to explicitly stop in advance to make precommitements each time I wanted to alter my instinctive behavior. Instead, it became automatic to make decisions in terms of which rules would be best to follow.
You're in Newcomb's Box
Part 1: Transparent Newcomb with your existence at stake
Related: Newcomb's Problem and Regret of Rationality
Omega, a wise and trustworthy being, presents you with a one-time-only game and a surprising revelation.
"I have here two boxes, each containing $100," he says. "You may choose to take both Box A and Box B, or just Box B. You get all the money in the box or boxes you take, and there will be no other consequences of any kind. But before you choose, there is something I must tell you."
Omega pauses portentously.
"You were created by a god: a being called Prometheus. Prometheus was neither omniscient nor particularly benevolent. He was given a large set of blueprints for possible human embryos, and for each blueprint that pleased him he created that embryo and implanted it in a human woman. Here was how he judged the blueprints: any that he guessed would grow into a person who would choose only Box B in this situation, he created. If he judged that the embryo would grow into a person who chose both boxes, he filed that blueprint away unused. Prometheus's predictive ability was not perfect, but it was very strong; he was the god, after all, of Foresight."
Do you take both boxes, or only Box B?
Newcomb's problem happened to me
Okay, maybe not me, but someone I know, and that's what the title would be if he wrote it. Newcomb's problem and Kavka's toxin puzzle are more than just curiosities relevant to artificial intelligence theory. Like a lot of thought experiments, they approximately happen. They illustrate robust issues with causal decision theory that can deeply affect our everyday lives.
Yet somehow it isn't mainstream knowledge that these are more than merely abstract linguistic issues, as evidenced by this comment thread (please no Karma sniping of the comments, they are a valuable record). Scenarios involving brain scanning, decision simulation, etc., can establish their validy and future relevance, but not that they are already commonplace. For the record, I want to provide an already-happened, real-life account that captures the Newcomb essence and explicitly describes how.
So let's say my friend is named Joe. In his account, Joe is very much in love with this girl named Omega… er… Kate, and he wants to get married. Kate is somewhat traditional, and won't marry him unless he proposes, not only in the sense of explicitly asking her, but also expressing certainty that he will never try to leave her if they do marry.
Now, I don't want to make up the ending here. I want to convey the actual account, in which Joe's beliefs are roughly schematized as follows:
- if he proposes sincerely, she is effectively sure to believe it.
- if he proposes insincerely, she will 50% likely believe it.
- if she believes his proposal, she will 80% likely say yes.
- if she doesn't believe his proposal, she will surely say no, but will not be significantly upset in comparison to the significance of marriage.
- if they marry, Joe will 90% likely be happy, and will 10% likely be unhappy.
He roughly values the happy and unhappy outcomes oppositely:
- being happily married to Kate: 125 megautilons
- being unhapily married to Kate: -125 megautilons.
So what should he do? What should this real person have actually done?1 Well, as in Newcomb, these beliefs and utilities present an interesting and quantifiable problem…
A problem with Timeless Decision Theory (TDT)
According to Ingredients of Timeless Decision Theory, when you set up a factored causal graph for TDT, "You treat your choice as determining the result of the logical computation, and hence all instantiations of that computation, and all instantiations of other computations dependent on that logical computation", where "the logical computation" refers to the TDT-prescribed argmax computation (call it C) that takes all your observations of the world (from which you can construct the factored causal graph) as input, and outputs an action in the present situation.
I asked Eliezer to clarify what it means for another logical computation D to be either the same as C, or "dependent on" C, for purposes of the TDT algorithm. Eliezer answered:
For D to depend on C means that if C has various logical outputs, we can infer new logical facts about D's logical output in at least some cases, relative to our current state of non-omniscient logical knowledge. A nice form of this is when supposing that C has a given exact logical output (not yet known to be impossible) enables us to infer D's exact logical output, and this is true for every possible logical output of C. Non-nice forms would be harder to handle in the decision theory but we might perhaps fall back on probability distributions over D.
I replied as follows (which Eliezer suggested I post here).
If that's what TDT means by the logical dependency between Platonic computations, then TDT may have a serious flaw.
Why (and why not) Bayesian Updating?
the use of Bayesian belief updating with expected utility maximization may be just an approximation that is only relevant in special situations which meet certain independence assumptions around the agent's actions.
For those who aren't sure of the need for an updateless decision theory, the paper Revisiting Savage in a conditional world by Paolo Ghirardato might help convince you. (Although that's probably not the intention of the author!) The paper gives a set of 7 axioms, based on Savage's axioms, which is necessary and sufficient for an agent's preferences in a dynamic decision problem to be represented as expected utility maximization with Bayesian belief updating. This helps us see in exactly which situations Bayesian updating works and why. (In many other axiomatizations of decision theory, the updating part is left out, and only expected utility maximization is derived in a static setting.)
Circular Altruism vs. Personal Preference
Suppose there is a diagnostic procedure that allows to catch a relatively rare disease with absolute precision. If left untreated, the disease if fatal, but when diagnosed it's easily treatable (I suppose there are some real-world approximations). The diagnostics involves an uncomfortable procedure and inevitable loss of time. At what a priori probability would you not care to take the test, leaving this outcome to chance? Say, you decide it's 0.0001%.
Enter timeless decision theory. Your decision to take or not take the test may be as well considered a decision for the whole population (let's also assume you are typical and everyone is similar in this decision). By deciding to personally not take the test, you've decided that most people won't take the test, and thus, for example, with 0.00005% of the population having the condition, about 3000 people will die. While personal tradeoff is fixed, this number obviously depends on the size of the population.
It seems like a horrible thing to do, making a decision that results in 3000 deaths. Thus, taking the test seems like a small personal sacrifice for this gift to others. Yet this is circular: everyone would be thinking that, reversing decision solely to help others, not benefiting personally. Nobody benefits.
Obviously, together with 3000 lives saved, there is a factor of 6 billion accepting the test, and that harm is also part of the outcome chosen by the decision. If everyone personally prefers to not take the test, then inflicting the opposite on the whole population is only so much worse.
Or is it?
Timeless Identity Crisis
Followup/summary/extension to this conversation with SilasBarta
So, you're going along, cheerfully deciding things, doing counterfactual surgery on the output of decision algorithm A1 to calculate the results of your decisions, but it turns out that a dark secret is undermining your efforts...
You are not running/being decision algorithm A1, but instead decision algorithm A2, an algorithm that happens to have the property of believing (erroneously) that it actually is A1.
Ruh-roh.
Now, it is _NOT_ my intent here to try to solve the problem of "how can you know which one you really are?", but instead to deal with the problem of "how can TDT take into account this possibility?"
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)