Timeless Decision Theory: Problems I Can't Solve

25Eliezer_Yudkowsky20 July 2009 12:02AM

Suppose you're out in the desert, running out of water, and soon to die - when someone in a motor vehicle drives up next to you.  Furthermore, the driver of the motor vehicle is a perfectly selfish ideal game-theoretic agent, and even further, so are you; and what's more, the driver is Paul Ekman, who's really, really good at reading facial microexpressions.  The driver says, "Well, I'll convey you to town if it's in my interest to do so - so will you give me $100 from an ATM when we reach town?"

Now of course you wish you could answer "Yes", but as an ideal game theorist yourself, you realize that, once you actually reach town, you'll have no further motive to pay off the driver.  "Yes," you say.  "You're lying," says the driver, and drives off leaving you to die.

If only you weren't so rational!

This is the dilemma of Parfit's Hitchhiker, and the above is the standard resolution according to mainstream philosophy's causal decision theory, which also two-boxes on Newcomb's Problem and defects in the Prisoner's Dilemma.  Of course, any self-modifying agent who expects to face such problems - in general, or in particular - will soon self-modify into an agent that doesn't regret its "rationality" so much.  So from the perspective of a self-modifying-AI-theorist, classical causal decision theory is a wash.  And indeed I've worked out what seems like an elegant theory, tentatively labeled "timeless decision theory", which covers these three Newcomblike problems and delivers a first-order answer that is already reflectively consistent, without need to explicitly consider such notions as "precommitment".  Unfortunately this "timeless decision theory" would require a long sequence to write up, and it's not my current highest writing priority unless someone offers to let me do a PhD thesis on it.

However, there are some other timeless decision problems for which I do not possess a general theory.

For example, there's a problem introduced to me by Gary Drescher's marvelous Good and Real (OOPS: The below formulation was independently invented by Vladimir Nesov; Drescher's book actually contains a related dilemma in which box B is transparent, and only contains $1M if Omega predicts you will one-box whether B appears full or empty, and Omega has a 1% error rate) which runs as follows:

Suppose Omega (the same superagent from Newcomb's Problem, who is known to be honest about how it poses these sorts of dilemmas) comes to you and says:

"I just flipped a fair coin.  I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000.  And if it came up tails, I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads.  The coin came up heads - can I have $1000?"

continue reading »

Counterfactual Mugging v. Subjective Probability

1MBlume20 July 2009 04:31PM

This has been in my drafts folder for ages, but in light of Eliezer's post yesterday, I thought I'd see if I could get some comment on it:

 

A couple weeks ago, Vladimir Nesov stirred up the biggest hornet's nest I've ever seen on LW by introducing us to the Counterfactual Mugging scenario.

If you didn't read it the first time, please do -- I don't plan to attempt to summarize.  Further, if you don't think you would give Omega the $100 in that situation, I'm afraid this article will mean next to nothing to you.

So, those still reading, you would give Omega the $100.  You would do so because if someone told you about the problem now, you could do the expected utility calculation 0.5*U(-$100)+0.5*U(+$10000)>0.  Ah, but where did the 0.5s come from in your calculation?  Well, Omega told you he flipped a fair coin.  Until he did, there existed a 0.5 probability of either outcome.  Thus, for you, hearing about the problem, there is a 0.5 probability of your encountering the problem as stated, and a 0.5 probability of your encountering the corresponding situation, in which Omega either hands you $10000 or doesn't, based on his prediction.  This is all very fine and rational.  

So, new problem.  Let's leave money out of it, and assume Omega hands you 1000 utilons in one case, and asks for them in the other -- exactly equal utility.  What if there is an urn, and it contains either a red or a blue marble, and Omega looks, maybe gives you the utility if the marble is red, and asks for it if the marble is blue?  What if you have devoted considerable time to determining whether the marble is red or blue, and your subjective probability has fluctuated over the course of you life? What if, unbeknownst to you, a rationalist community has been tracking evidence of the marble's color (including your own probability estimates), and running a prediction market, and Omega now shows you a plot of the prices over the past few years?

In short, what information do you use to calculate the probability you plug into the EU calculation?

Ingredients of Timeless Decision Theory

37Eliezer_Yudkowsky19 August 2009 01:10AM

Followup toNewcomb's Problem and Regret of Rationality, Towards a New Decision Theory

Wei Dai asked:

"Why didn't you mention earlier that your timeless decision theory mainly had to do with logical uncertainty? It would have saved people a lot of time trying to guess what you were talking about."

...

All right, for the benefit of the hypothetical individual who really can get things that quickly, here's a fast summary of the most important ingredients that go into my "timeless decision theory" - since Wei Dai already guessed what I think of as the key starting insight.  This isn't so much an explanation of TDT, as a list of starting ideas that you could use to recreate TDT given sufficient background knowledge.  My past experience suggests that writing this compactly has no impact - that it takes a mini-book - but perhaps Dai or others will prove me wrong.

The one-sentence version is:  Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.

The three-sentence version is:  Factor your uncertainty over (impossible) possible worlds into a causal graph that includes nodes corresponding to the unknown outputs of known computations; condition on the known initial conditions of your decision computation to screen off factors influencing the decision-setup; compute the counterfactuals in your expected utility formula by surgery on the node representing the logical output of that computation.

continue reading »

Timeless Decision Theory and Meta-Circular Decision Theory

16Eliezer_Yudkowsky20 August 2009 10:07PM

(This started as a reply to Gary Drescher's comment here in which he proposes a Metacircular Decision Theory (MCDT); but it got way too long so I turned it into an article, which also contains some amplifications on TDT which may be of general interest.)

continue reading »

Outlawing Anthropics: An Updateless Dilemma

18Eliezer_Yudkowsky08 September 2009 06:31PM

Let us start with a (non-quantum) logical coinflip - say, look at the heretofore-unknown-to-us-personally 256th binary digit of pi, where the choice of binary digit is itself intended not to be random.

If the result of this logical coinflip is 1 (aka "heads"), we'll create 18 of you in green rooms and 2 of you in red rooms, and if the result is "tails" (0), we'll create 2 of you in green rooms and 18 of you in red rooms.

After going to sleep at the start of the experiment, you wake up in a green room.

With what degree of credence do you believe - what is your posterior probability - that the logical coin came up "heads"?

There are exactly two tenable answers that I can see, "50%" and "90%".

Suppose you reply 90%.

And suppose you also happen to be "altruistic" enough to care about what happens to all the copies of yourself.  (If your current system cares about yourself and your future, but doesn't care about very similar xerox-siblings, then you will tend to self-modify to have future copies of yourself care about each other, as this maximizes your expectation of pleasant experience over future selves.)

Then I attempt to force a reflective inconsistency in your decision system, as follows:

I inform you that, after I look at the unknown binary digit of pi, I will ask all the copies of you in green rooms whether to pay $1 to every version of you in a green room and steal $3 from every version of you in a red room.  If they all reply "Yes", I will do so.

continue reading »

Timeless Identity Crisis

5Psy-Kosh11 September 2009 02:37AM

Followup/summary/extension to this conversation with SilasBarta

So, you're going along, cheerfully deciding things, doing counterfactual surgery on the output of decision algorithm A1 to calculate the results of your decisions, but it turns out that a dark secret is undermining your efforts...

You are not running/being decision algorithm A1, but instead decision algorithm A2, an algorithm that happens to have the property of believing (erroneously) that it actually is A1.

Ruh-roh.

Now, it is _NOT_ my intent here to try to solve the problem of "how can you know which one you really are?", but instead to deal with the problem of "how can TDT take into account this possibility?"

continue reading »

Circular Altruism vs. Personal Preference

3Vladimir_Nesov26 October 2009 01:43AM

Suppose there is a diagnostic procedure that allows to catch a relatively rare disease with absolute precision. If left untreated, the disease if fatal, but when diagnosed it's easily treatable (I suppose there are some real-world approximations). The diagnostics involves an uncomfortable procedure and inevitable loss of time. At what a priori probability would you not care to take the test, leaving this outcome to chance? Say, you decide it's 0.0001%.

Enter timeless decision theory. Your decision to take or not take the test may be as well considered a decision for the whole population (let's also assume you are typical and everyone is similar in this decision). By deciding to personally not take the test, you've decided that most people won't take the test, and thus, for example, with 0.00005% of the population having the condition, about 3000 people will die. While personal tradeoff is fixed, this number obviously depends on the size of the population.

It seems like a horrible thing to do, making a decision that results in 3000 deaths. Thus, taking the test seems like a small personal sacrifice for this gift to others. Yet this is circular: everyone would be thinking that, reversing decision solely to help others, not benefiting personally. Nobody benefits.

Obviously, together with 3000 lives saved, there is a factor of 6 billion accepting the test, and that harm is also part of the outcome chosen by the decision. If everyone personally prefers to not take the test, then inflicting the opposite on the whole population is only so much worse.

Or is it?

continue reading »

Why (and why not) Bayesian Updating?

8Wei_Dai16 November 2009 09:27PM

the use of Bayesian belief updating with expected utility maximization may be just an approximation that is only relevant in special situations which meet certain independence assumptions around the agent's actions.

Steve Rayhawk

For those who aren't sure of the need for an updateless decision theory, the paper Revisiting Savage in a conditional world by Paolo Ghirardato might help convince you. (Although that's probably not the intention of the author!) The paper gives a set of 7 axioms, based on Savage's axioms, which is necessary and sufficient for an agent's preferences in a dynamic decision problem to be represented as expected utility maximization with Bayesian belief updating. This helps us see in exactly which situations Bayesian updating works and why. (In many other axiomatizations of decision theory, the updating part is left out, and only expected utility maximization is derived in a static setting.)

continue reading »

A problem with Timeless Decision Theory (TDT)

28Gary_Drescher04 February 2010 06:47PM

According to Ingredients of Timeless Decision Theory, when you set up a factored causal graph for TDT, "You treat your choice as determining the result of the logical computation, and hence all instantiations of that computation, and all instantiations of other computations dependent on that logical computation", where "the logical computation" refers to the TDT-prescribed argmax computation (call it C) that takes all your observations of the world (from which you can construct the factored causal graph) as input, and outputs an action in the present situation.

I asked Eliezer to clarify what it means for another logical computation D to be either the same as C, or "dependent on" C, for purposes of the TDT algorithm. Eliezer answered:

For D to depend on C means that if C has various logical outputs, we can infer new logical facts about D's logical output in at least some cases, relative to our current state of non-omniscient logical knowledge.  A nice form of this is when supposing that C has a given exact logical output (not yet known to be impossible) enables us to infer D's exact logical output, and this is true for every possible logical output of C. Non-nice forms would be harder to handle in the decision theory but we might perhaps fall back on probability distributions over D.

I replied as follows (which Eliezer suggested I post here).

If that's what TDT means by the logical dependency between Platonic computations, then TDT may have a serious flaw.

continue reading »