I don't understand the motivation for developing view (3). It seems like any possible agent could be interpreted that way:
When deciding how to act, all that it cares about is whether its act in this world is "right", where "right" means "maximizes the fixed act-evaluating function that was built into me."
How does it help us to understand UDT specifically?
The reason the agent is happy is that it did the right thing. Merely doing the right thing has given the agent all the utility it could hope for.
This seems to be tacking a lot of anthropomorphic emotional reactions onto the agent's decision theory.
Imagine an agent that follows the decision theory of "Always take the first option presented." but has humanlike reactions to the outcome.
It will one box or two box depending on how the situation is described to it, but it will be happy if it gets the million dollars.
The process used to make choices need not be connected to the process used to evaluate preference.
I feel again as if I do not understand what Timeless Decision Theory or Updateless Decision Theory is (or what it's for; what it adds to ordinary decision theory). Can anyone help me? For example, by providing the simplest possible example of one of these "decision theories" in action?
Suppose we have an agent that cares about something extremely simple, like number of paperclips in the world. More paperclips is a better world. Can someone provide an example of how TDT or UDT would matter, or would make a difference, or would be applied, by an entity which made its decisions using that criterion?
The act-evaluating function is just a particular computation which, for the agent, constitutes the essence of rightness.
This sounds almost like saying that the agent is running its own algorithm because running this particular algorithm constitutes the essence of rightness. This perspective doesn't improve understanding of the process of decision-making, it just rounds up the whole agent in an opaque box and labels it an officially approved way to compute. The "rightness" and "actual world" properties you ascribe to this opaque box don't seem to be actually present.
Let me see if I understand your argument correctly: UDT works by converting all beliefs about facts into their equivalent value expressions (due to fact/value equivalence), and chooses the optimal program for maximizing expected utility according to those values.
So, if you were to program a robot such that it adheres to the decisions output by UDT, then this robot, when acting, can be viewed as simply adhering to a programmer-fed ruleset. That ruleset does not explicitly use desirability of any consequence as a desideratum when deciding what action to out...
Voted up for, among other things, actually explaining UDT in a way I could understand. Thanks! :-)
In a True Counterfactual Mugging, Omega would ask the agent to give up utility.
Doesn't this, like, trivially define what should be the correct decision? What's the point?
It seems to me that you're looking for a way to model a deontologist.
And a necessary condition is that you follow a function that does not depend on states of the world. If you don't have any fixed principles, we can't call you a deontologist. You can call that UDT (I think I've seen the same thing called rule-utilitarianism.)
Is there a more complicated insight than that here?
With respect to (1), UDT maximizes over worlds where the zillionth digit of pi is 1, 2, 3...8, 9, 0. It does this even after it knows the value of the digit in question. Most of those worlds aren't part of the Tegmark level IV multiverse. It seems this post could benefit from distinguishing between possible and impossible possible worlds.
What is the difference between (1) and (2)? Just an XML tag that the agent doesn't care about, but sticks onto one of the worlds it considers possible? (Why would it continue spending cycles to compute which world is actual, if it doesn't care?)
According to this view, the agent cares about only the actual world.
A decision-making algorithm can only care about things accessible in its mind. The "actual world" is not one of them.
Although how does it connect with a phrase later in the paragraph?
It doesn't care at all about the actual consequences that result from its action.
One way (the usual way?) to think of an agent running Updateless Decision Theory is to imagine that the agent always cares about all possible worlds according to how probable those worlds seemed when the agent's source code was originally written.
Seemed to who? And what about the part where the probabilities are controlled by agent's decisions (as estimated by mathematical intuition)?
One way (the usual way?) to think of an agent running Updateless Decision Theory is to imagine that the agent always cares about all possible worlds according to how probable those worlds seemed to the agent's builders when they wrote the agent's source code[see added footnote 1 below]. In particular, the agent never develops any additional concern for whatever turns out to be the actual world[2]. This is what puts the "U" in "UDT".
I suggest an alternative conception of a UDT agent, without changing the UDT formalism. According to this view, the agent cares about only the actual world. In fact, at any time, the agent cares about only one small facet of the actual world — namely, whether the agent's act at that time maximizes a certain fixed act-evaluating function. In effect, a UDT agent is the ultimate deontologist: It doesn't care at all about the actual consequences that result from its action. One implication of this conception is that a UDT agent cannot be truly counterfactually mugged.
[ETA: For completeness, I give a description of UDT here (pdf).]
Vladimir Nesov's Counterfactual Mugging presents us with the following scenario:
An agent following UDT will give the $100. Imagine that we were building an agent, and that we will receive whatever utility follows from the agent's actions. Then it's easy to see why we should build our agent to give Omega the money in this scenario. After all, at the time we build our agent, we know that Omega might one day flip a fair coin with the intentions Nesov describes. Whatever probability this has of happening, our expected earnings are greater if we program our agent to give Omega the $100 on tails.
More generally, if we suppose that we get whatever utility will follow from our agent's actions, then we can do no better than to program the agent to follow UDT. But since we have to program the UDT agent now, the act-evaluating function that determines how the agent will act needs to be fixed with the probabilities that we know now. This will suffice to maximize our expected utility given our best knowledge at the time when we build the agent.
So, it makes sense for a builder to program an agent to follow UDT on expected-utility grounds. We can understand the builder's motivations. We can get inside the builder's head, so to speak.
But what about the agent's head? The brilliance of Nesov's scenario is that it is so hard, on first hearing it, to imagine why a reasonable agent would give Omega the money knowing that the only result will be that they gave up $100. It's easy enough to follow the UDT formalism. But what on earth could the UDT agent itself be thinking? Yes, trying to figure this out is an exercise in anthropomorphization. Nonetheless, I think that it is worthwhile if we are going to use UDT to try to understand what we ought to do.
Here are three ways to conceive of the agent's thinking when it gives Omega the $100. They form a sort of spectrum.
View (3) is the one that I wanted to develop in this post. On this view, the "probability distribution" in the act-evaluating function no longer has any epistemic meaning for the agent. The act-evaluating function is just a particular computation which, for the agent, constitutes the essence of rightness. Yes, the computation involves considering some counterfactuals, but to consider those counterfactuals does not entail any ontological commitment.
Thus, when the agent has been counterfactually mugged, it's not (as in (1)) happy because it cares about expected utility over all possible worlds. It's not (as in (2)) happy because, in some merely-possible world, Omega gave it $10000. On this view, the agent considers all those "possible worlds" to have been rendered impossible by what it has learned since it was built. The reason the agent is happy is that it did the right thing. Merely doing the right thing has given the agent all the utility it could hope for. More to the point, the agent got that utility in the actual world. The agent knows that it did the right thing, so it genuinely does not care about what actual consequences will follow from its action.
In other words, although the agent lost $100, it really gained from the interaction with Omega. This suggests that we try to consider a "true" analog of the Counterfactual Mugging. In The True Prisoner's Dilemma, Eliezer Yudkowsky presents a version of the Prisoner's Dilemma in which it's viscerally clear that the payoffs at stake capture everything that we care about, not just our selfish values. The point is to make the problem about utilons, and not about some stand-in, such as years in prison or dollars.
In a True Counterfactual Mugging, Omega would ask the agent to give up utility. Here we see that the UDT agent cannot possibly do as Omega asks. Whatever it chooses to do will turn out to have in fact maximized its utility. Not just expected utility, but actual utility. In the original Counterfactual Mugging, the agent looks like something of a chump who gave up $100 for nothing. But in the True Counterfactual Mugging, our deontological agent lives with the satisfaction that, no matter what it does, it lives in the best of all possible worlds.
[1] ETA: Under UDT, the agent assigns a utility to having all of the possible worlds P1, P2, . . . undergo respective execution histories E1, E2, . . .. (The way that a world evolves may depend in part on the agent's action). That is, for each vector <E1, E2, . . .> of ways that these worlds could respectively evolve, the agent assigns a utility U(<E1, E2, . . .>). Due to criticisms by Vladimir Nesov (beginning here), I have realized that this post only applies to instances of UDT in which the utility function U takes the form that it has in standard decision theories. In this case, each world Pi has its own probability pr(Pi) and its own utility function ui that takes an execution history of Pi alone as input, and the function U takes the form
U(<E1, E2, . . .>) = Σi pr(Pi) ui(Ei).
The probabilities pr(Pi) are what I'm talking about when I mention probabilities in this post. Wei Dai is interested in instances of UDT with more general utility functions U. However, to my knowledge, this special kind of utility function is the only one in terms of which he's talked about the meanings of probabilities of possible worlds in UDT. See in particular this quote from the original UDT post:
(A "program" is what Wei Dai calls a possible world in that post.) The utility function U is "baked in" to the UDT agent at the time it's created. Therefore, so too are the probabilities pr(Pi).
[2] By "the actual world", I do not mean one of the worlds in the many-worlds interpretation (MWI) of quantum mechanics. I mean something more like the entire path traversed by the quantum state vector of the universe through its corresponding Hilbert space. Distinct possible worlds are distinct paths that the state of the universe might (for all we know) be traversing in this Hilbert space. All the "many worlds" of the MWI together constitute a single world in the sense used here.
ETA: This post was originally titled "UDT agents are deontologists". I changed the title to "UDT agents as deontologists" to emphasize that I am describing a way to view UDT agents. That is, I am describing an interpretive framework for understanding the agent's thinking. My proposal is analogous to Dennett's "intentional stance". To take the intentional stance is not to make a claim about what a conscious organism is doing. Rather, it is to make use of a framework for organizing our understanding of the organism's behavior. Similarly, I am not suggesting that UDT somehow gets things wrong. I am saying that it might be more natural for us if we think of the UDT agent as a deontologist, instead of as an agent that never changes its belief about which possible worlds will actually happen. I say a little bit more about this in this comment.