Self-modification is the correct justification for updateless decision theory
Reply to: Late great filter is not bad news
Suppose that you build an AI, and Omega appears to it and says:
Here's a button. A million years ago I calculated the umpteenth digit of pi. If it is even, I calculated whether you would press this button (in such a way that your human creator was never simulated as a conscious being). If I predicted that you wouldn't press the button, I destroyed Earth right then and there.* If it is odd, I created a doomsday device that will destroy the solar system if you press this button.
[* ETA: Assume that if the digit is even and the AI is predicted to press the button, Omega does not destroy Earth, but does turn Alpha Centauri purple (say). The point is for this to be a scenario that you, the AI creator, know not to have come to pass.]
Suppose you're the kind of AI creator whose AI is time consistent in a certain sense from the beginning of time and presses the button. Then you have an AI that satisfies a certain kind of philosopher, wins big in a certain logically impossible world, and destroys humanity.
Suppose, on the other hand, that you're a very similar kind of AI creator, only you program your AI not to take into account impossible possible worlds that had already turned out to be impossible (when you created the AI | when you first became convinced that timeless decision theory is right). Then you've got an AI that most of the time acts the same way, but does worse in worlds we know to be logically impossible, and destroys humanity less often in worlds we do not know to be logically impossible.
Wei Dai's great filter post seems to suggest that under UDT, you should be the first kind of AI creator. I don't think that's true, actually; I think that in UDT, you should probably not start with a "prior" probability distribution that gives significant weight to logical propositions you know to be false: do you think the AI should press the button if it was the first digit of pi that Omega calculated?
But obviously, you don't want tomorrow's you to pick the prior that way just after Omega has appeared to it in a couterfactual mugging (because according to your best reasoning today, there's a 50% chance this loses you a million dollars).
The most convincing argument I know for timeless flavors of decision theory is that if you could modify your own source code, the course of action that maximizes your expected utility is to modify into a timeless decider. So yes, you should do that. Any AI you build should be timeless from the start; and it's reasonable to make yourself into the kind of person that will decide timelessly with your probability distribution today (if you can do that).
But I don't think you should decide that updateless decision theory is therefore so pure and reflectively consistent that you should go and optimize your payoff even in worlds whose logical impossibility was clear before you first decided to be a timeless decider (say). Perhaps it's less elegant to justify UDT through self-modification at some arbitrary point in time than through reflective consistency all the way from the big bang on; but in the worlds we can't rule out yet, it's more likely to win.
Late Great Filter Is Not Bad News
But I hope that our Mars probes will discover nothing. It would be good news if we find Mars to be completely sterile. Dead rocks and lifeless sands would lift my spirit.
Conversely, if we discovered traces of some simple extinct life form—some bacteria, some algae—it would be bad news. If we found fossils of something more advanced, perhaps something looking like the remnants of a trilobite or even the skeleton of a small mammal, it would be very bad news. The more complex the life we found, the more depressing the news of its existence would be. Scientifically interesting, certainly, but a bad omen for the future of the human race.
— Nick Bostrom, in Where Are They? Why I hope that the search for extraterrestrial life finds nothing
This post is a reply to Robin Hanson's recent OB post Very Bad News, as well as Nick Bostrom's 2008 paper quoted above, and assumes familiarity with Robin's Great Filter idea. (Robin's server for the Great Filter paper seems to be experiencing some kind of error. See here for a mirror.)
Suppose Omega appears and says to you:
(Scenario 1) I'm going to apply a great filter to humanity. You get to choose whether the filter is applied one minute from now, or in five years. When the designated time arrives, I'll throw a fair coin, and wipe out humanity if it lands heads. And oh, it's not the current you that gets to decide, but the version of you 4 years and 364 days from now. I'll predict his or her decision and act accordingly.
I hope it's not controversial that the current you should prefer a late filter, since (with probability .5) that gives you and everyone else five more years of life. What about the future version of you? Well, if he or she decides on the early filter, that would constitutes a time inconsistency. And for those who believe in multiverse/many-worlds theories, choosing the early filter shortens the lives of everyone in half of all universes/branches where a copy of you is making this decision, which doesn't seem like a good thing. It seems clear that, ignoring human deviations from ideal rationality, the right decision of the future you is to choose the late filter.
Explicit Optimization of Global Strategy (Fixing a Bug in UDT1)
When describing UDT1 solutions to various sample problems, I've often talked about UDT1 finding the function S* that would optimize its preferences over the world program P, and then return what S* would return, given its input. But in my original description of UDT1, I never explicitly mentioned optimizing S as a whole, but instead specified UDT1 as, upon receiving input X, finding the optimal output Y* for that input, by considering the logical consequences of choosing various possible outputs. I have been implicitly assuming that the former (optimization of the global strategy) would somehow fall out of the latter (optimization of the local action) without having to be explicitly specified, due to how UDT1 takes into account logical correlations between different instances of itself. But recently I found an apparent counter-example to this assumption.
(I think this "bug" also exists in TDT, but I don't understand it well enough to make a definite claim. Perhaps Eliezer or someone else can tell me if TDT correctly solves the sample problem given here.)
A problem with Timeless Decision Theory (TDT)
According to Ingredients of Timeless Decision Theory, when you set up a factored causal graph for TDT, "You treat your choice as determining the result of the logical computation, and hence all instantiations of that computation, and all instantiations of other computations dependent on that logical computation", where "the logical computation" refers to the TDT-prescribed argmax computation (call it C) that takes all your observations of the world (from which you can construct the factored causal graph) as input, and outputs an action in the present situation.
I asked Eliezer to clarify what it means for another logical computation D to be either the same as C, or "dependent on" C, for purposes of the TDT algorithm. Eliezer answered:
For D to depend on C means that if C has various logical outputs, we can infer new logical facts about D's logical output in at least some cases, relative to our current state of non-omniscient logical knowledge. A nice form of this is when supposing that C has a given exact logical output (not yet known to be impossible) enables us to infer D's exact logical output, and this is true for every possible logical output of C. Non-nice forms would be harder to handle in the decision theory but we might perhaps fall back on probability distributions over D.
I replied as follows (which Eliezer suggested I post here).
If that's what TDT means by the logical dependency between Platonic computations, then TDT may have a serious flaw.
What Are Probabilities, Anyway?
In Probability Space & Aumann Agreement, I wrote that probabilities can be thought of as weights that we assign to possible world-histories. But what are these weights supposed to mean? Here I’ll give a few interpretations that I've considered and held at one point or another, and their problems. (Note that in the previous post, I implicitly used the first interpretation in the following list, since that seems to be the mainstream view.)
- Only one possible world is real, and probabilities represent beliefs about which one is real.
- Which world gets to be real seems arbitrary.
- Most possible worlds are lifeless, so we’d have to be really lucky to be alive.
- We have no information about the process that determines which world gets to be real, so how can we decide what the probability mass function p should be?
- All possible worlds are real, and probabilities represent beliefs about which one I’m in.
- Before I’ve observed anything, there seems to be no reason to believe that I’m more likely to be in one world than another, but we can’t let all their weights be equal.
- Not all possible worlds are equally real, and probabilities represent “how real” each world is. (This is also sometimes called the “measure” or “reality fluid” view.)
- Which worlds get to be “more real” seems arbitrary.
- Before we observe anything, we don't have any information about the process that determines the amount of “reality fluid” in each world, so how can we decide what the probability mass function p should be?
- All possible worlds are real, and probabilities represent how much I care about each world. (To make sense of this, recall that these probabilities are ultimately multiplied with utilities to form expected utilities in standard decision theories.)
- Which worlds I care more or less about seems arbitrary. But perhaps this is less of a problem because I’m “allowed” to have arbitrary values.
- Or, from another perspective, this drops another another hard problem on top of the pile of problems called “values”, where it may never be solved.
Why (and why not) Bayesian Updating?
the use of Bayesian belief updating with expected utility maximization may be just an approximation that is only relevant in special situations which meet certain independence assumptions around the agent's actions.
For those who aren't sure of the need for an updateless decision theory, the paper Revisiting Savage in a conditional world by Paolo Ghirardato might help convince you. (Although that's probably not the intention of the author!) The paper gives a set of 7 axioms, based on Savage's axioms, which is necessary and sufficient for an agent's preferences in a dynamic decision problem to be represented as expected utility maximization with Bayesian belief updating. This helps us see in exactly which situations Bayesian updating works and why. (In many other axiomatizations of decision theory, the updating part is left out, and only expected utility maximization is derived in a static setting.)
The Absent-Minded Driver
This post examines an attempt by professional decision theorists to treat an example of time inconsistency, and asks why they failed to reach the solution (i.e., TDT/UDT) that this community has more or less converged upon. (Another aim is to introduce this example, which some of us may not be familiar with.) Before I begin, I should note that I don't think "people are crazy, the world is mad" (as Eliezer puts it) is a good explanation. Maybe people are crazy, but unless we can understand how and why people are crazy (or to put it more diplomatically, "make mistakes"), how can we know that we're not being crazy in the same way or making the same kind of mistakes?
The problem of the ‘‘absent-minded driver’’ was introduced by Michele Piccione and Ariel Rubinstein in their 1997 paper "On the Interpretation of Decision Problems with Imperfect Recall". But I'm going to use "The Absent-Minded Driver" by Robert J. Aumann, Sergiu Hart, and Motty Perry instead, since it's shorter and more straightforward. (Notice that the authors of this paper worked for a place called Center for the Study of Rationality, and one of them won a Nobel Prize in Economics for his work on game theory. I really don't think we want to call these people "crazy".)
Here's the problem description:
An absent-minded driver starts driving at START in Figure 1. At X he
can either EXIT and get to A (for a payoff of 0) or CONTINUE to Y. At Y he
can either EXIT and get to B (payoff 4), or CONTINUE to C (payoff 1). The
essential assumption is that he cannot distinguish between intersections X
and Y, and cannot remember whether he has already gone through one of
them.
Torture vs. Dust vs. the Presumptuous Philosopher: Anthropic Reasoning in UDT
In this post, I'd like to examine whether Updateless Decision Theory can provide any insights into anthropic reasoning. Puzzles/paradoxes in anthropic reasoning is what prompted me to consider UDT originally and this post may be of interest to those who do not consider Counterfactual Mugging to provide sufficient motivation for UDT.
The Presumptuous Philosopher is a thought experiment that Nick Bostrom used to argue against the Self-Indication Assumption. (SIA: Given the fact that you exist, you should (other things equal) favor hypotheses according to which many observers exist over hypotheses on which few observers exist.)
Timeless Decision Theory and Meta-Circular Decision Theory
(This started as a reply to Gary Drescher's comment here in which he proposes a Metacircular Decision Theory (MCDT); but it got way too long so I turned it into an article, which also contains some amplifications on TDT which may be of general interest.)
Towards a New Decision Theory
It commonly acknowledged here that current decision theories have deficiencies that show up in the form of various paradoxes. Since there seems to be little hope that Eliezer will publish his Timeless Decision Theory any time soon, I decided to try to synthesize some of the ideas discussed in this forum, along with a few of my own, into a coherent alternative that is hopefully not so paradox-prone.
I'll start with a way of framing the question. Put yourself in the place of an AI, or more specifically, the decision algorithm of an AI. You have access to your own source code S, plus a bit string X representing all of your memories and sensory data. You have to choose an output string Y. That’s the decision. The question is, how? (The answer isn't “Run S,” because what we want to know is what S should be in the first place.)
Let’s proceed by asking the question, “What are the consequences of S, on input X, returning Y as the output, instead of Z?” To begin with, we'll consider just the consequences of that choice in the realm of abstract computations (i.e. computations considered as mathematical objects rather than as implemented in physical systems). The most immediate consequence is that any program that calls S as a subroutine with X as input, will receive Y as output, instead of Z. What happens next is a bit harder to tell, but supposing that you know something about a program P that call S as a subroutine, you can further deduce the effects of choosing Y versus Z by tracing the difference between the two choices in P’s subsequent execution. We could call these the computational consequences of Y. Suppose you have preferences about the execution of a set of programs, some of which call S as a subroutine, then you can satisfy your preferences directly by choosing the output of S so that those programs will run the way you most prefer.
= 783df68a0f980790206b9ea87794c5b6)


Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)