It commonly acknowledged here that current decision theories have deficiencies that show up in the form of various paradoxes. Since there seems to be little hope that Eliezer will publish his Timeless Decision Theory any time soon, I decided to try to synthesize some of the ideas discussed in this forum, along with a few of my own, into a coherent alternative that is hopefully not so paradox-prone.
I'll start with a way of framing the question. Put yourself in the place of an AI, or more specifically, the decision algorithm of an AI. You have access to your own source code S, plus a bit string X representing all of your memories and sensory data. You have to choose an output string Y. That’s the decision. The question is, how? (The answer isn't “Run S,” because what we want to know is what S should be in the first place.)
Let’s proceed by asking the question, “What are the consequences of S, on input X, returning Y as the output, instead of Z?” To begin with, we'll consider just the consequences of that choice in the realm of abstract computations (i.e. computations considered as mathematical objects rather than as implemented in physical systems). The most immediate consequence is that any program that calls S as a subroutine with X as input, will receive Y as output, instead of Z. What happens next is a bit harder to tell, but supposing that you know something about a program P that call S as a subroutine, you can further deduce the effects of choosing Y versus Z by tracing the difference between the two choices in P’s subsequent execution. We could call these the computational consequences of Y. Suppose you have preferences about the execution of a set of programs, some of which call S as a subroutine, then you can satisfy your preferences directly by choosing the output of S so that those programs will run the way you most prefer.
A more general class of consequences might be called logical consequences. Consider a program P’ that doesn’t call S, but a different subroutine S’ that’s logically equivalent to S. In other words, S’ always produces the same output as S when given the same input. Due to the logical relationship between S and S’, your choice of output for S must also affect the subsequent execution of P’. Another example of a logical relationship is an S' which always returns the first bit of the output of S when given the same input, or one that returns the same output as S on some subset of inputs.
In general, you can’t be certain about the consequences of a choice, because you’re not logically omniscient. How to handle logical/mathematical uncertainty is an open problem, so for now we'll just assume that you have access to a "mathematical intuition subroutine" that somehow allows you to form beliefs about the likely consequences of your choices.
At this point, you might ask, “That’s well and good, but what if my preferences extend beyond abstract computations? What about consequences on the physical universe?” The answer is, we can view the physical universe as a program that runs S as a subroutine, or more generally, view it as a mathematical object which has S embedded within it. (From now on I’ll just refer to programs for simplicity, with the understanding that the subsequent discussion can be generalized to non-computable universes.) Your preferences about the physical universe can be translated into preferences about such a program P and programmed into the AI. The AI, upon receiving an input X, will look into P, determine all the instances where it calls S with input X, and choose the output that optimizes its preferences about the execution of P. If the preferences were translated faithfully, the the AI's decision should also optimize your preferences regarding the physical universe. This faithful translation is a second major open problem.
What if you have some uncertainty about which program our universe corresponds to? In that case, we have to specify preferences for the entire set of programs that our universe may correspond to. If your preferences for what happens in one such program is independent of what happens in another, then we can represent them by a probability distribution on the set of programs plus a utility function on the execution of each individual program. More generally, we can always represent your preferences as a utility function on vectors of the form <E1, E2, E3, …> where E1 is an execution history of P1, E2 is an execution history of P2, and so on.
These considerations lead to the following design for the decision algorithm S. S is coded with a vector <P1, P2, P3, ...> of programs that it cares about, and a utility function on vectors of the form <E1, E2, E3, …> that defines its preferences on how those programs should run. When it receives an input X, it looks inside the programs P1, P2, P3, ..., and uses its "mathematical intuition" to form a probability distribution P_Y over the set of vectors <E1, E2, E3, …> for each choice of output string Y. Finally, it outputs a string Y* that maximizes the expected utility Sum P_Y(<E1, E2, E3, …>) U(<E1, E2, E3, …>). (This specifically assumes that expected utility maximization is the right way to deal with mathematical uncertainty. Consider it a temporary placeholder until that problem is solved. Also, I'm describing the algorithm as a brute force search for simplicity. In reality, you'd probably want it to do something cleverer to find the optimal Y* more quickly.)
Example 1: Counterfactual Mugging
Note that Bayesian updating is not done explicitly in this decision theory. When the decision algorithm receives input X, it may determine that a subset of programs it has preferences about never calls it with X and are also logically independent of its output, and therefore it can safely ignore them when computing the consequences of a choice. There is no need to set the probabilities of those programs to 0 and renormalize.
So, with that in mind, we can model Counterfactual Mugging by the following Python program:
def P(coin):
AI_balance = 100
if coin == "heads":
if S("heads") == "give $100":
AI_balance -= 100
if coin == "tails":
if Omega_Predict(S, "heads") == "give $100":
AI_balance += 10000
The AI’s goal is to maximize expected utility = .5 * U(AI_balance after P("heads")) + .5 * U(AI_balance after P("tails")). Assuming U(AI_balance)=AI_balance, it’s easy to determine U(AI_balance after P("heads")) as a function of S’s output. It equals 0 if S(“heads”) == “give $100”, and 100 otherwise. To compute U(AI_balance after P("tails")), the AI needs to look inside the Omega_Predict function (not shown here), and try to figure out how accurate it is. Assuming the mathematical intuition module says that choosing “give $100” as the output for S(“heads”) makes it more likely (by a sufficiently large margin) for Omega_Predict(S, "heads") to output “give $100”, then that choice maximizes expected utility.
Example 2: Return of Bayes
This example is based on case 1 in Eliezer's post Priors as Mathematical Objects. An urn contains 5 red balls and 5 white balls. The AI is asked to predict the probability of each ball being red as it as drawn from the urn, its goal being to maximize the expected logarithmic score of its predictions. The main point of this example is that this decision theory can reproduce the effect of Bayesian reasoning when the situation calls for it. We can model the scenario using preferences on the following Python program:
def P(n):
urn = ['red', 'red', 'red', 'red', 'red', 'white', 'white', 'white', 'white', 'white']
history = []
score = 0
while urn:
i = n%len(urn)
n = n/len(urn)
ball = urn[i]
urn[i:i+1] = []
prediction = S(history)
if ball == 'red':
score += math.log(prediction, 2)
else:
score += math.log(1-prediction, 2)
print (score, ball, prediction)
history.append(ball)
Here is a printout from a sample run, using n=1222222:
-1.0 red 0.5
-2.16992500144 red 0.444444444444
-2.84799690655 white 0.375
-3.65535182861 white 0.428571428571
-4.65535182861 red 0.5
-5.9772799235 red 0.4
-7.9772799235 red 0.25
-7.9772799235 white 0.0
-7.9772799235 white 0.0
-7.9772799235 white 0.0
S should use deductive reasoning to conclude that returning (number of red balls remaining / total balls remaining) maximizes the average score across the range of possible inputs to P, from n=1 to 10! (representing the possible orders in which the balls are drawn), and do that. Alternatively, S can approximate the correct predictions using brute force: generate a random function from histories to predictions, and compute what the average score would be if it were to implement that function. Repeat this a large number of times and it is likely to find a function that returns values close to the optimum predictions.
Example 3: Level IV Multiverse
In Tegmark's Level 4 Multiverse, all structures that exist mathematically also exist physically. In this case, we'd need to program the AI with preferences over all mathematical structures, perhaps represented by an ordering or utility function over conjunctions of well-formed sentences in a formal set theory. The AI will then proceed to "optimize" all of mathematics, or at least the parts of math that (A) are logically dependent on its decisions and (B) it can reason or form intuitions about.
I suggest that the Level 4 Multiverse should be considered the default setting for a general decision theory, since we cannot rule out the possibility that all mathematical structures do indeed exist physically, or that we have direct preferences on mathematical structures (in which case there is no need for them to exist "physically"). Clearly, application of decision theory to the Level 4 Multiverse requires that the previously mentioned open problems be solved in their most general forms: how to handle logical uncertainty in any mathematical domain, and how to map fuzzy human preferences to well-defined preferences over the structures of mathematical objects.
Added: For further information and additional posts on this decision theory idea, which came to be called "Updateless Decision Theory", please see its entry in the LessWrong Wiki.
Although I still have not tried to decipher what "Timeless Decision Theory" or "Updateless Decision Theory" is actually about, I would like to observe that it is very unlikely that the "timeless" aspect, in the sense of an ontology which denies the reality of time, change, or process, is in any way essential to how it works.
If you have a Julian-Barbour-style timeless wavefunction of the universe, which associates an amplitude with every point in a configuration space of spacelike states of the universe, you can always construct histories using Bohm's formula of following the configuration-space gradient of the complex phase of the wavefunction.
I don't actually advocate Bohmian mechanics, I'd prefer something more like "quantum causal histories" a la Fotini Markopoulou, but looking even more like a background-independent cellular automaton. However, the ever-present Bohmian option should demonstrate that there is no particular intrinsic necessity to the abandonment of time. And basic subjective experience, phenomenology if you will, shows that change is real and about as basic as existence itself. The denial of time is a classic case of people denying what's right in front of them because of absorption in a theory or belief that there is no alternative. I'd basically attribute it to love of the power of objectifying thought to explain things: I can map the events of reality onto a mathematical structure which is static at least in my mind, that structure has enormous clarifying and predictive power, so therefore reality must be static, i.e. there is no time.
So I think the fashion hereabouts for denying the reality of time is a basic error. But it's a hard one to argue against because it requires some detachment from the intellectual drive to formalize and objectify everything which is just about synonymous with rationality and truthseeking in the local worldview, and requires instead that one spend a bit of time being a phenomenologist, reflecting on the nature of experience without preconceptions, and noticing that, yes, it's there and it flows, and maybe it's a mistake to call the flow an illusion just because your basic intellectual method is about mapping the world onto static ideal forms.
However, my real point here is not to argue against the ontology of timelessness. It is to suggest that the basic features of Timeless Decision Theory, whatever they may be, may actually be logically independent of the assumption of a timeless reality; and that it might be worth someone's time to re-express the theory in a language which does not presuppose timelessness. It would be a shame to see a basic innovation in decision theory unnecessarily bound to a particular wrong ontology.
AFAIK, Timeless Decision Theory doesn't have anything to say about the reality of time, only that decisions shouldn't vary depending on the time when they are considered.