Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

UDT agents as deontologists

8 Post author: Tyrrell_McAllister 10 June 2010 05:01AM

One way (the usual way?) to think of an agent running Updateless Decision Theory is to imagine that the agent always cares about all possible worlds according to how probable those worlds seemed to the agent's builders when they wrote the agent's source code[see added footnote 1 below].  In particular, the agent never develops any additional concern for whatever turns out to be the actual world[2].  This is what puts the "U" in "UDT".

I suggest an alternative conception of a UDT agent, without changing the UDT formalism. According to this view, the agent cares about only the actual world.  In fact, at any time, the agent cares about only one small facet of the actual world — namely, whether the agent's act at that time maximizes a certain fixed act-evaluating function.  In effect, a UDT agent is the ultimate deontologist:  It doesn't care at all about the actual consequences that result from its action.  One implication of this conception is that a UDT agent cannot be truly counterfactually mugged.

[ETA: For completeness, I give a description of UDT here (pdf).]

Vladimir Nesov's Counterfactual Mugging presents us with the following scenario:

Imagine that one day, Omega comes to you and says that it has just tossed a fair coin, and given that the coin came up tails, it decided to ask you to give it $100. Whatever you do in this situation, nothing else will happen differently in reality as a result. Naturally you don't want to give up your $100. But see, the Omega tells you that if the coin came up heads instead of tails, it'd give you $10000, but only if you'd agree to give it $100 if the coin came up tails.

Omega can predict your decision in case it asked you to give it $100, even if that hasn't actually happened, it can compute the counterfactual truth. The Omega is also known to be absolutely honest and trustworthy, no word-twisting, so the facts are really as it says, it really tossed a coin and really would've given you $10000.

An agent following UDT will give the $100.  Imagine that we were building an agent, and that we will receive whatever utility follows from the agent's actions.  Then it's easy to see why we should build our agent to give Omega the money in this scenario.  After all, at the time we build our agent, we know that Omega might one day flip a fair coin with the intentions Nesov describes.  Whatever probability this has of happening, our expected earnings are greater if we program our agent to give Omega the $100 on tails.

More generally, if we suppose that we get whatever utility will follow from our agent's actions, then we can do no better than to program the agent to follow UDT.  But since we have to program the UDT agent now, the act-evaluating function that determines how the agent will act needs to be fixed with the probabilities that we know now.  This will suffice to maximize our expected utility given our best knowledge at the time when we build the agent.

So, it makes sense for a builder to program an agent to follow UDT on expected-utility grounds.  We can understand the builder's motivations.  We can get inside the builder's head, so to speak.

But what about the agent's head?  The brilliance of Nesov's scenario is that it is so hard, on first hearing it, to imagine why a reasonable agent would give Omega the money knowing that the only result will be that they gave up $100.  It's easy enough to follow the UDT formalism.  But what on earth could the UDT agent itself be thinking?  Yes, trying to figure this out is an exercise in anthropomorphization.  Nonetheless, I think that it is worthwhile if we are going to use UDT to try to understand what we ought to do.

Here are three ways to conceive of the agent's thinking when it gives Omega the $100.  They form a sort of spectrum.

  1. One extreme view:  The agent considers all the possible words to be on equal ontological footing.  There is no sense in which any one of them is distinguished as "actual" by the agent.  It conceives of itself as acting simultaneously in all the possible worlds so as to maximize utility over all of them.  Sometimes this entails acting in one world so as to make things worse in that world.  But, no matter which world this is, there is nothing special about it.  The only property of the world that has any ontologically significance is the probability weight given to that world at the time that the agent was built. (I believe that this is roughly the view that Wei Dai himself takes, but I may be wrong.)
  2. An intermediate view:  The agent thinks that there is only one actual world.  That is, there is an ontological fact of the matter about which world is actual.  However, the other possible worlds continue to exist in some sense, although they are merely possible, not actual.  Nonetheless, the agent continues to care about all of the possible worlds, and this amount of care never changes.  After being counterfactually mugged, the agent is happy to know that, in some merely-possible world, Omega gave the agent $10000.
  3. The other extreme:  As in (2), the agent thinks that there is only one actual world.  Contrary to (2), the agent cares about only this world.  However, the agent is a deontologist.  When deciding how to act, all that it cares about is whether its act in this world is "right", where "right" means "maximizes the fixed act-evaluating function that was built into me."

View (3) is the one that I wanted to develop in this post.  On this view, the "probability distribution" in the act-evaluating function no longer has any epistemic meaning for the agent.  The act-evaluating function is just a particular computation which, for the agent, constitutes the essence of rightness.  Yes, the computation involves considering some counterfactuals, but to consider those counterfactuals does not entail any ontological commitment.

Thus, when the agent has been counterfactually mugged, it's not (as in (1)) happy because it cares about expected utility over all possible worlds.  It's not (as in (2)) happy because, in some merely-possible world, Omega gave it $10000.  On this view, the agent considers all those "possible worlds" to have been rendered impossible by what it has learned since it was built.  The reason the agent is happy is that it did the right thing.  Merely doing the right thing has given the agent all the utility it could hope for.  More to the point, the agent got that utility in the actual world.  The agent knows that it did the right thing, so it genuinely does not care about what actual consequences will follow from its action.

In other words, although the agent lost $100, it really gained from the interaction with Omega.  This suggests that we try to consider a "true" analog of the Counterfactual Mugging.  In The True Prisoner's Dilemma, Eliezer Yudkowsky presents a version of the Prisoner's Dilemma in which it's viscerally clear that the payoffs at stake capture everything that we care about, not just our selfish values.  The point is to make the problem about utilons, and not about some stand-in, such as years in prison or dollars.

In a True Counterfactual Mugging, Omega would ask the agent to give up utility.  Here we see that the UDT agent cannot possibly do as Omega asks.  Whatever it chooses to do will turn out to have in fact maximized its utility.  Not just expected utility, but actual utility. In the original Counterfactual Mugging, the agent looks like something of a chump who gave up $100 for nothing.  But in the True Counterfactual Mugging, our deontological agent lives with the satisfaction that, no matter what it does, it lives in the best of all possible worlds.

 


[1] ETA: Under UDT, the agent assigns a utility to having all of the possible worlds P1, P2, . . . undergo respective execution histories E1, E2, . . ..  (The way that a world evolves may depend in part on the agent's action).  That is, for each vector <E1, E2, . . .> of ways that these worlds could respectively evolve, the agent assigns a utility U(<E1, E2, . . .>).  Due to criticisms by Vladimir Nesov (beginning here), I have realized that this post only applies to instances of UDT in which the utility function U takes the form that it has in standard decision theories.  In this case, each world Pi has its own probability pr(Pi) and its own utility function ui that takes an execution history of Pi alone as input, and the function U takes the form

U(<E1, E2, . . .>) = Σi pr(Pi) ui(Ei).

The probabilities pr(Pi) are what I'm talking about when I mention probabilities in this post.  Wei Dai is interested in instances of UDT with more general utility functions U.  However, to my knowledge, this special kind of utility function is the only one in terms of which he's talked about the meanings of probabilities of possible worlds in UDT.  See in particular this quote from the original UDT post:

If your preferences for what happens in one such program is independent of what happens in another, then we can represent them by a probability distribution on the set of programs plus a utility function on the execution of each individual program.

(A "program" is what Wei Dai calls a possible world in that post.)  The utility function U is "baked in" to the UDT agent at the time it's created.  Therefore, so too are the probabilities pr(Pi).

[2] By "the actual world", I do not mean one of the worlds in the many-worlds interpretation (MWI) of quantum mechanics.  I mean something more like the entire path traversed by the quantum state vector of the universe through its corresponding Hilbert space.  Distinct possible worlds are distinct paths that the state of the universe might (for all we know) be traversing in this Hilbert space.  All the "many worlds" of the MWI together constitute a single world in the sense used here.

 


 

ETA: This post was originally titled "UDT agents are deontologists".  I changed the title to "UDT agents as deontologists" to emphasize that I am describing a way to view UDT agents.  That is, I am describing an interpretive framework for understanding the agent's thinking.  My proposal is analogous to Dennett's "intentional stance".  To take the intentional stance is not to make a claim about what a conscious organism is doing.  Rather, it is to make use of a framework for organizing our understanding of the organism's behavior.  Similarly, I am not suggesting that UDT somehow gets things wrong.  I am saying that it might be more natural for us if we think of the UDT agent as a deontologist, instead of as an agent that never changes its belief about which possible worlds will actually happen.  I say a little bit more about this in this comment.

Comments (109)

Comment author: Wei_Dai 10 June 2010 01:48:05AM 9 points [-]

I don't understand the motivation for developing view (3). It seems like any possible agent could be interpreted that way:

When deciding how to act, all that it cares about is whether its act in this world is "right", where "right" means "maximizes the fixed act-evaluating function that was built into me."

How does it help us to understand UDT specifically?

Comment author: Tyrrell_McAllister 10 June 2010 04:09:35AM *  1 point [-]

I don't claim that it helps us to understand UDT as a decision theory. It is a way to anthropomorphize UDT agents to get an intuitively-graspable sense of what it might "feel like to be" a UDT agent with a particular utility function U over worlds.

Comment author: Tyrrell_McAllister 10 June 2010 06:40:12AM *  0 points [-]

It seems like any possible agent could be interpreted that way:

When deciding how to act, all that it cares about is whether its act in this world is "right", where "right" means "maximizes the fixed act-evaluating function that was built into me."

How does it help us to understand UDT specifically?

I think that I probably missed your point in my first reply. I see now that you were probably asking why it's any more useful to view UDT agents this way that it would be to view any arbitrary agent as a deontologist.

The reason is that the UDT agent appears, from the outside, to be taking into account what happens in possible worlds that it should know will never happen, at least according to conventional epistemology. Unlike conventional consequentialists, you cannot interpret its behavior as a function of what it thinks might happen in the actual world (with what probabilities and with what payoffs). You can interpret its behavior as a function of what its builders thought might happen in the actual world, but you can't do this for the agent itself.

One response to this is to treat the UDT agent as a consequentialist who cares about the consequences of its actions even in possible worlds that it knows aren't actual. This is perfectly fine, except that it makes it hard to conceive of the agent as learning anything. The agent continues to take into account the evolution-histories of world-programs that would call it as a subroutine if they were run, even after it learns that they won't be run. (Obviously this is not a problem if you think that the notion of an un-run program is incoherent.)

The alternative approach that I offer allows us to think of the agent as learning which once-possible worlds are actual. This is a more natural way to conceive of epistemic agents in my opinion. The cost is that the UDT agent is now a deontologist, for whom the rightness of an action doesn't depend on just the effects that it will have in the actual world. "Rightness" doesn't depend on actual consequences, at least not exclusively. However, the additional factors that figure into the "rightness" of an act require no further justification as far as the agent is concerned.

This is not to turn those additional factors into a "black box". They were designed by the agent's builders on conventional consequentialist grounds.

Comment author: Mitchell_Porter 10 June 2010 07:27:02AM *  4 points [-]

I feel again as if I do not understand what Timeless Decision Theory or Updateless Decision Theory is (or what it's for; what it adds to ordinary decision theory). Can anyone help me? For example, by providing the simplest possible example of one of these "decision theories" in action?

Suppose we have an agent that cares about something extremely simple, like number of paperclips in the world. More paperclips is a better world. Can someone provide an example of how TDT or UDT would matter, or would make a difference, or would be applied, by an entity which made its decisions using that criterion?

Comment author: saturn 10 June 2010 05:27:17PM 3 points [-]

This is my vague understanding.

Naive decision theory: "Choose the action that will cause the highest expected utility, given what I know now."

Timeless decision theory: "Choose the action that I wish I had precommitted to, given what I know now."

Updateless decision theory: "Choose a set of rules that will cause the highest expected utility given my priors, then stick to it no matter what happens."

Comment author: NancyLebovitz 11 June 2010 09:20:36AM 0 points [-]

If this is accurate, then I don't see how UDT can generally be better than TDT.

UDT would be better in circumstances where you suspect that your ability to update accurately is compromised.

I'm assuming that the priors for UDT were set at some past time.

Comment author: saturn 11 June 2010 09:04:22PM *  1 point [-]

UDT gives the money in the counterfactual mugging thought experiment, TDT doesn't.

There's nothing that prevents a UDT agent from behaving as if it were updating; that's what I surmise would happen in more normal situations where Omega isn't involved. But if ignoring information is the winning move, TDT can't do that.

Comment author: Mass_Driver 10 June 2010 01:10:43PM *  0 points [-]

Here, so far as I can understand it, is UDT vs. ordinary DT for paper clips:

Ordinary DT ("ODT") says: at all times t, act so as to maximize the number of paper clips that will be observed at time (t + 1), where "1" is a long time and we don't have to worry about discount rates.

UDT says: in each situation s, take the action that is returns the highest value on an internal lookup table that has been incorporated into me as part of my programming, which, incidentally, was programmed by people who loved paper clips.

Suppose ODT and UDT are fairly dumb, say, as smart as a cocker spaniel.

Suppose we put both agents on the set of the movie Office Space. ODT will scan the area, evaluate the situation, and simulate several different courses of action, one of which is bending staples into paper clips. Other models might include hiding, talking to accountants, and attempting to program a paper clip screensaver using Microsoft Office. The model that involves bending staples shows the highest number of paper clips in the future compared to other models, so the ODT will start bending staples. If the ODT is later surprised to discover that the boss has walked in and confiscated the staples, it will be "sad" because it did not get as much paper-clip utility as it expected to, and it will mentally adjust the utility of the "bend staples" model downward, especially when it detects boss-like objects. In the future, this may lead ODT to adopt different courses of behavior, such as "bend staples until you see boss, then hide." The reason for changing course and adopting these other behaviors is that they would have relatively higher utility in its modeling scheme.

UDT will scan the area, evaluate the situation, and categorize the situation as situation #7, which roughly corresponds to "metal available, no obvious threats, no other obvious resources," and lookup the correct action for situation #7, which its programmers have specified is "bend staples into paper clips." Accordingly, UDT will bend staples. If UDT is later surprised to discover that the boss has wandered in and confiscated the staples, it will not care. The UDT will continue to be confident that it did the "right" thing by following its instructions for the given situation, and would behave exactly the same way if it encountered a similar situation.

UDT sounds stupider, and, at cocker-spaniel levels of intelligence, it undoubtedly is. That's why evolution designed cocker-spaniels to run on ODT, which is much more Pavlovian. However, UDT has the neat little virtue that it is immune to a Counterfactual Mugging. If we could somehow design a UDT that was arbitrarily intelligent, it would both achieve great results and win in a situation where ODT failed.

Comment author: Tyrrell_McAllister 10 June 2010 03:09:18PM 1 point [-]

Here, so far as I can understand it, is Tyrell's UDT vs. ordinary DT for paper clips:

For god's sake, don't call it my UDT :D. My post already seems to be giving some people the impression that I was suggesting some amendment or improvement to Wei Dai's UDT.

Comment author: Mass_Driver 10 June 2010 03:40:57PM 1 point [-]

Edited. [grin]

Comment author: Douglas_Knight 10 June 2010 09:11:20PM 1 point [-]

TDT and UDT are intended to solve Newcomb's problem and the prisoner's dilemma and those are surely the simplest examples of their strengths. It is fairly widely believed that, say, causal decision theory two-boxes and defects, but I would rather say that CDT simply doesn't understand the statements of the problems. Either way, one-boxing and arranging mutual cooperation are improvements.

Comment author: Vladimir_Nesov 10 June 2010 10:40:25AM 1 point [-]

If it's any consolation, the last bit of understanding of the original Wei Dai's post (the role of execution histories, prerequisite to being able to make this correction) dawned on me only last week, as a result of a long effort for developing a decision theory of my own that only in retrospect turned out to be along roughly the same lines as UDT.

Comment author: khafra 10 June 2010 01:44:45PM 0 points [-]

A convergence like that makes both UDT and your decision theory more interesting to me. Is the process of your decision theory's genesis detailed on your personal blog? In retrospect, was your starting place and development process influenced heavily enough by LW/OB/Wei Dai to screen out the coincidence?

Comment author: Vladimir_Nesov 10 June 2010 02:29:36PM *  2 points [-]

I call it "ambient control". This can work as an abstract:

You, as an agent, determine what you do, and so have the power to choose which statements about you are true. By making some statements true and not others, you influence the truth of other statements that logically depend on the statements about you. Thus, if you have preference about what should be true about the world, you can make some of those things true by choosing what to do. Theories of consequences (partially) investigate what becomes true if you make a particular decision. (Of course, you can't change what's true, but you do determine what's true, because some truths are about you.)

Longer description here. I'll likely post on some aspects of it in the future, as the idea gets further developed. There is a lot of trouble with logical strength of theories of consequences, for example. There is also some hope to unify logical and observational uncertainty here, at the same time making the decision algorithm computationally feasible (it's not part of the description linked above).

Comment author: Vladimir_Nesov 09 June 2010 11:55:35PM 4 points [-]

The act-evaluating function is just a particular computation which, for the agent, constitutes the essence of rightness.

This sounds almost like saying that the agent is running its own algorithm because running this particular algorithm constitutes the essence of rightness. This perspective doesn't improve understanding of the process of decision-making, it just rounds up the whole agent in an opaque box and labels it an officially approved way to compute. The "rightness" and "actual world" properties you ascribe to this opaque box don't seem to be actually present.

Comment author: Tyrrell_McAllister 10 June 2010 12:33:40AM *  0 points [-]

The "rightness" and "actual world" properties you ascribe to this opaque box don't seem to be actually present.

They aren't present as part of what we must know to predict the agent's actions. They are part of a "stance" (like Dennett's intentional stance) that we can use to give a narrative framework within which to understand agent's motivation. What you are calling a black box isn't supposed to be part of the "view" at all. Instead of a black box, there is a socket where a particular program vector <P1, P2, . . .> and "preference vector" <E1, E2, . . .>, together with the UDT formalism, can be plugged in.

ETA: The reference to a "'preference vector' <E1, E2, . . .>" was a misreading of Wei Dai's post on my part. What I (should have) meant was the utility function U over world-evolution vectors <E1, E2, . . .>.

Comment author: Vladimir_Nesov 10 June 2010 01:06:54AM 0 points [-]

I don't understand this.

Comment author: Mass_Driver 10 June 2010 12:30:35AM *  0 points [-]

Edited

Previously, I attempted to disagree with this comment. My disagreement was tersely dismissed, and, when I protested, my protests were strongly downvoted. This suggests two possibilities:

(1) I fail to understand this topic in ways that I fail to understand or (2) I lack the status in this community for my disagreement with Vladmir_Nesov on this topic to be welcomed or taken seriously.

If I were certain that the problem were (2), then I would continue to press my point, and the karma loss be damned. However, I am still uncertain about what the problem is, and so I am deleting all my posts on the thread underneath this comment.

One commenter suggested that I was being combative myself; he may be right. If so, I apologize for my tone.

Comment author: Vladimir_Nesov 10 June 2010 12:47:23AM 0 points [-]

Saying that this decision is "right" has no explanatory power, gives no guidelines on the design of decision-making algorithms.

Comment author: Tyrrell_McAllister 10 June 2010 01:32:25AM *  0 points [-]

gives no guidelines on the design of decision-making algorithms.

I am nowhere purporting to be giving guidelines for the design of a decision-making algorithm. As I said, I am not suggesting any alteration of the UDT formalism. I was also explicit in the OP that there is no problem understanding at an intuitive level what the agent's builders were thinking when they decided to use UDT.

If all you care about is designing an agent that you can set loose to harvest utility for you, then my post is not meant to be interesting to you.

Comment author: Vladimir_Nesov 10 June 2010 01:40:23AM 2 points [-]

Beliefs should pay rent, not fly in the ether, unattached to what they are supposed to be about.

Comment author: Tyrrell_McAllister 10 June 2010 04:03:43AM *  0 points [-]

Beliefs should pay rent . . .

The whole Eliezer quote is that beliefs should "pay rent in future anticipations". Beliefs about which once-possible world is actual do this.

Comment author: Vladimir_Nesov 10 June 2010 10:33:42AM 0 points [-]

The beliefs in question are yours, and anticipation is about agent's design or behavior.

Comment author: JamesAndrix 10 June 2010 03:09:30PM 3 points [-]

The reason the agent is happy is that it did the right thing. Merely doing the right thing has given the agent all the utility it could hope for.

This seems to be tacking a lot of anthropomorphic emotional reactions onto the agent's decision theory.

Imagine an agent that follows the decision theory of "Always take the first option presented." but has humanlike reactions to the outcome.

It will one box or two box depending on how the situation is described to it, but it will be happy if it gets the million dollars.

The process used to make choices need not be connected to the process used to evaluate preference.

Comment author: Tyrrell_McAllister 10 June 2010 07:11:56PM *  1 point [-]

This seems to be tacking a lot of anthropomorphic emotional reactions onto the agent's decision theory.

It may in some cases be inappropriate to anthropomorphize an agent. But anthropomorphization can be useful in other cases. My suggestion in the OP is to be used in the case where anthropomorphization seems useful.

Imagine an agent that follows the decision theory of "Always take the first option presented." but has humanlike reactions to the outcome.

This is a great example. Maybe I should have started with something like that to motivate the post.

Suppose that someone you cared about were acting like this. Let's suppose that, according to your decision theory, you should try to change the person to follow a different decision algorithm. One option is to consider them to be a baffling alien, whose actions you can predict, but whose thinking you cannot at all sympathize with.

However, if you care about them, you might want to view them in a way that encourages sympathy. You also probably want to interpret their psychology in a way that seems as human as possible, so that you can bring to bear the tools of psychology. Psychology, at this time, depends heavily on using our own human brains as almost-opaque boxes to model other neurologically similar humans. So your only hope of helping this person is to conceive of them in a way that seems more like a normal human. You need to anthropomorphize them.

In this case, I would probably first try to think of the person as a normal person who is being parasitized by an alien agent with this weird decision theory. I would focus on trying to remove the parasitic agent. The hope would be that the human has normal human decision-making mechanisms that were being overridden by the parasite.

Comment author: Mass_Driver 10 June 2010 12:26:56AM 1 point [-]

Voted up for, among other things, actually explaining UDT in a way I could understand. Thanks! :-)

Comment author: SilasBarta 10 June 2010 05:04:20PM *  1 point [-]

Let me see if I understand your argument correctly: UDT works by converting all beliefs about facts into their equivalent value expressions (due to fact/value equivalence), and chooses the optimal program for maximizing expected utility according to those values.

So, if you were to program a robot such that it adheres to the decisions output by UDT, then this robot, when acting, can be viewed as simply adhering to a programmer-fed ruleset. That ruleset does not explicitly use desirability of any consequence as a desideratum when deciding what action to output, and the ruleset can be regarded as the robot's judgment of "what is right". Because it does "what is right" irrespective of the consequences (esp. in its particular location in time/space/world), its moral judgments match those of a deontologist.

Does that about get it right?

Comment author: Tyrrell_McAllister 10 June 2010 08:47:48PM 0 points [-]

Does that about get it right?

I think that's about right. Your next question might be, "How does this make a UDT agent different from any other?" I address that question in this reply to Wei Dai.

Comment author: SilasBarta 10 June 2010 09:17:01PM *  0 points [-]

Thanks! Turns out I correctly guessed your answer to that question too! (I noticed the distinction between the programmer's goals and [what the agent regards as] the agent's goals, but hadn't mentioned that explicitly in my summary.)

Doesn't sound too unreasonable to me... I'll think about it some more.

Edit: Do you think it would be a good idea to put (a modified version of) my summary at the top of your article?

Comment author: Vladimir_Nesov 10 June 2010 12:10:43AM 1 point [-]

In a True Counterfactual Mugging, Omega would ask the agent to give up utility.

Doesn't this, like, trivially define what should be the correct decision? What's the point?

Comment author: Tyrrell_McAllister 10 June 2010 12:26:29AM 0 points [-]

What's the point?

The point is, "the UDT agent cannot possibly satisfy this request." So I think we agree here (?).

Comment author: Vladimir_Nesov 10 June 2010 12:41:36AM 0 points [-]

You'd need to represent your problem statement in terms UDT understands, with the world program and strategy-controlled probabilities for its possible execution histories, and fixed utilities for each possible execution history. If you do that properly, you'll find that UDT acts correctly (otherwise, you haven't managed to correctly represent your problem statement...).

Comment author: Tyrrell_McAllister 10 June 2010 12:47:54AM *  2 points [-]

If you do that properly, you'll find that UDT acts correctly

Are you under the impression that I am saying that UDT acts incorrectly? I was explicit that I was suggesting no change to the UDT formalism. I was explicit that I was suggesting a way to anthropomorphize what the agent is thinking. Are you familiar with Dennett's notion of an intentional stance? This is like that. To suggest that we view the agent from a different stance is not to suggest that the agent acts differently.

ETA: I'm gathering that I should have been clearer that the so-called "true counterfactual mugging" is trivial or senseless when posed to a UDT agent. I'm a little surprised that I failed to make this clear, because it was the original thought that motivated the post. It's not immediately obvious to me how to make this clearer, so I will give it some thought.

Comment author: Vladimir_Nesov 10 June 2010 01:02:57AM 0 points [-]

You've got this in the post:

In a True Counterfactual Mugging, Omega would ask the agent to give up utility. Here we see that the UDT agent cannot possibly satisfy this request.

I'm not sure what you intended to say by that, but it sounds like "UDT agent will make the wrong decision", together with an opaque proposition that Omega offers "actual utility and not even expected utility", which it's not at all clear how to represent formally.

Comment author: Tyrrell_McAllister 10 June 2010 03:50:26AM *  1 point [-]

I'm not sure what you intended to say by that, but it sounds like "UDT agent will make the wrong decision",

No, that is not at all what I meant. That interpretation never occurred to me. I meant that the UDT agent cannot possibly give up the utility that Omega asks for in the previous sentence. Now that I understood how you misunderstood that part, I will edit it.

Comment author: Vladimir_Nesov 10 June 2010 10:08:26AM 0 points [-]

Well, isn't it a good thing that UDT won't give up utility to Omega? You can't take away utility on one side of the coin, and return it on the other, utility is global.

Comment author: Tyrrell_McAllister 10 June 2010 01:58:59PM *  1 point [-]

Well, isn't it a good thing that UDT won't give up utility to Omega?

Yes, of course it is. I'm afraid that I don't yet understand why you thought that I suggest otherwise.

You can't take away utility on one side of the coin, and return it on the other, utility is global.

Yes, that is why I said that the agent couldn't possibly satisfy Omega's request to give it utility.

You are attacking a position that I don't hold. But I'm not sure what position you're thinking of, so I don't know how to address the misunderstanding. You haven't made any claim that I disagree with in response to that paragraph.

Comment author: [deleted] 12 June 2010 10:52:50PM 0 points [-]

It seems to me that you're looking for a way to model a deontologist.

And a necessary condition is that you follow a function that does not depend on states of the world. If you don't have any fixed principles, we can't call you a deontologist. You can call that UDT (I think I've seen the same thing called rule-utilitarianism.)

Is there a more complicated insight than that here?

Comment author: Tyrrell_McAllister 12 June 2010 11:22:14PM 0 points [-]

It seems to me that you're looking for a way to model a deontologist.

I don't think so. I'm supposing that I'm reasonably comfortable with human deontologists, and I'm trying to use that familiarity to make intuitive sense of the behavior of a UDT agent.

Comment author: [deleted] 12 June 2010 11:32:47PM 1 point [-]

Well, that's the way the post was phrased ("a UDT agent is a deontologist.")

But you could construct a UDT agent that doesn't behave anything like a human deontologist, who acts based upon a function that has nothing to do with rights or virtues or moral laws. That's why I think it's better understood as "All deontologists are UDT" instead of vice versa.

Comment author: Tyrrell_McAllister 12 June 2010 11:42:49PM 1 point [-]

It's easier for me to understand an agent who acts on weird principles (such as those having nothing to do with rights or virtues or moral laws) than an agent who either

  • thinks that all possible worlds are equally actual, or

  • doesn't care more for what happens in the actual world than what happens in possible worlds.

So, if I were to think of deontologists as UDT agents, I would be moving them further away from comprehensibility.

Comment deleted 10 June 2010 02:08:07AM *  [-]
Comment author: Vladimir_Nesov 10 June 2010 02:13:12AM *  1 point [-]

With respect to (1), UDT maximizes over worlds where the zillionth digit of pi is 1, 2, 3...8, 9, 10.

These are not different worlds for UDT, but a single world that can have different possible execution histories that state zillionth digit of pi to be 0,1,...,9. Mathematical intuition establishes a probability distribution over these execution histories for the fixed world program that defines the subject matter.

Comment author: Tyrrell_McAllister 10 June 2010 04:14:46AM 0 points [-]

It seems this post could benefit from distinguishing between possible and impossible possible worlds.

That may be so, but I need to think about how to do it. I said that the possible worlds are whatever the agent's builders thought was possible. That is, "possibility" refers to the builders' ignorance, including their ignorance about the zillionth digit of pi.

Comment author: Nick_Tarleton 10 June 2010 12:15:12AM 0 points [-]

What is the difference between (1) and (2)? Just an XML tag <actual> that the agent doesn't care about, but sticks onto one of the worlds it considers possible? (Why would it continue spending cycles to compute which world is actual, if it doesn't care?)

Comment author: Tyrrell_McAllister 10 June 2010 12:25:03AM *  0 points [-]

What is the difference between (1) and (2)? Just an XML tag <actual> that the agent doesn't care about, but sticks onto one of the worlds it considers possible?

Basically, yes. (2) is not a view that I advocate.

Comment author: Vladimir_Nesov 09 June 2010 11:29:00PM *  0 points [-]

According to this view, the agent cares about only the actual world.

A decision-making algorithm can only care about things accessible in its mind. The "actual world" is not one of them.

Although how does it connect with a phrase later in the paragraph?

It doesn't care at all about the actual consequences that result from its action.

Comment author: Tyrrell_McAllister 09 June 2010 11:40:35PM *  0 points [-]

A decision-making algorithm can only care about things accessible in its mind. The "actual world" is not one of them.

The purpose of this post is not to defend realism, and I think that it would take me far afield to do so now. For example, on my view, the agent is not identical to its decision-making algorithm, if that is to be construed as saying that the agent is purely an abstract mathematical entity. Rather, the agent is the actual implementation of that algorithm. The universe is not purely an algorithm. It is an implementation of that algorithm. Not all algorithms are in fact run.

I haven't given any reasons for the position that I just stated. But I hope that you can recognize it as a familiar position, however incoherent it seems to you. Do you need any more explanation to understand the viewpoint that I'm coming from in the post?

Comment author: Vladimir_Nesov 10 June 2010 12:08:47AM *  0 points [-]

The actual world is not epistemically accessible to the agent. It's a useless concept for its decision-making algorithm. An ontology (logic of actions and observations) that describes possible worlds and in which you can interpret observations, is useful, but not the actual world.

Comment author: Tyrrell_McAllister 10 June 2010 12:21:17AM *  0 points [-]

An ontology is not a "logic of actions and observations" as I am using the term. I am using it in the sense described in the Stanford Encyclopedia of Philosophy.

At any rate, what I'm calling the ontology is not part of the decision theory. I consider different ontologies that the agent might think in terms of, but I am explicit that I am not trying to change how the UDT itself works when I write, "I suggest an alternative conception of a UDT agent, without changing the UDT formalism."

Comment author: Vladimir_Nesov 10 June 2010 01:06:16AM 0 points [-]
Comment author: Vladimir_Nesov 09 June 2010 11:26:03PM *  0 points [-]

One way (the usual way?) to think of an agent running Updateless Decision Theory is to imagine that the agent always cares about all possible worlds according to how probable those worlds seemed when the agent's source code was originally written.

Seemed to who? And what about the part where the probabilities are controlled by agent's decisions (as estimated by mathematical intuition)?

Comment author: Tyrrell_McAllister 09 June 2010 11:26:44PM *  0 points [-]

Seemed to who?

To the agent's builders.

ETA: I make that clear later in the post, but I'll add it to the intro paragraph.

And what about the part where the probabilities are controlled by agent's decisions?

I'm not sure what you mean. What I'm describing as coded into the agent "from birth" is Wei Dai's function P, which takes an output string Y as its argument (using subscript notation in his post).

ETA: Sorry, that is not right. To be more careful, I mean the "mathematical intuition" that takes in an input X and returns such a function P. But P isn't controlled by the agent's decisions.

ETA2: Gah. I misremembered how Wei Dai used his notation. And when I went back to the post to answer your question, I skimmed to quickly and misread.

So, final answer, when I say that "the agent always cares about all possible worlds according to how probable those worlds seemed to the agent's builders when they wrote the agent's source code", I'm talking about the "preference vector" that Wei Dai denotes by "<E1, E2, . . . >" and which he says "defines its preferences on how those programs should run."

I took him to be thinking of these entries Ei as corresponding to probabilities because of his post What Are Probabilities, Anyway?, where he suggests that "probabilities represent how much I care about each world".

ETA3: Nope, this was another misreading on my part. Wei Dai does not say that <E1, E2, . . . > is a vector of preferences, or anything like that. He says that it is an input to a utility function U, and that utility function is what "defines [the agent's] preferences on how those programs should run". So, what I gather very tentatively at this point is that the probability of each possible world is baked into the utility function U.

Comment author: Vladimir_Nesov 10 June 2010 12:56:14AM *  0 points [-]

I took him to be thinking of these entries Ei as corresponding to probabilities because of his post What Are Probabilities, Anyway?, where he suggests that "probabilities represent how much I care about each world".

Do you see that these E's are not intended to be interpreted as probabilities here, and so "probabilities of possible worlds are fixed at the start" remark at the beginning of your post is wrong?

Comment author: Tyrrell_McAllister 10 June 2010 03:57:25AM 0 points [-]

Do you see that these E's are not intended to be interpreted as probabilities here,

Yes.

and so "probabilities of possible worlds are fixed at the start" remark at the beginning of your post is wrong?

I realize that my post applies only to the kind of UDT agent that Wei Dai talks about when he discusses what probabilities of possible worlds are. See the added footnote.

Comment author: Vladimir_Nesov 10 June 2010 09:55:13AM *  0 points [-]

I realize that my post applies only to the kind of UDT agent that Wei Dai talks about when he discusses what probabilities of possible worlds are. See the added footnote.

It's still misinterpretation of Wei Dai's discussion of probability. What you described is not UDT, and not even a decision theory: say, what U(<E1,E2,...>) is for? It's not utility of agent's decision. When Wei Dai discusses probability in the post you linked, he still means it in the same sense as is used in decision theories, but makes informal remarks about what those values, say, P_Y(...), seem to denote. From the beginning of the post:

I wrote that probabilities can be thought of as weights that we assign to possible world-histories.

Weights assigned to world-histories, not worlds. Totally different. (Although Wei Dai doesn't seem to consistently follow the distinction in terminology himself, it begins to matter when you try to express things formally.)

Edit: this comment is wrong, see correction here.

Comment author: Tyrrell_McAllister 10 June 2010 07:32:48PM *  1 point [-]

It's still misinterpretation of Wei Dai's discussion of probability. What you described is not UDT, and not even a decision theory

I have added a link (pdf) to a complete description of what a UDT algorithm is. I am confident that there are no "misinterpretations" there, but I would be grateful if you pointed out any that you perceive.

Comment author: Vladimir_Nesov 10 June 2010 08:59:24PM 1 point [-]

I believe it is an accurate description of UDT as presented in the original post, although incomplete knowledge about P_i can be accommodated without changing the formalism, by including all alternatives (completely described this time) enabled by available knowledge about the corresponding world programs, in the list {P_i} (which is the usual reading of "possible world"). Also note that in this post Wei Dai corrected the format of the decisions from individual input/output instances to global strategy-selection.

Comment author: Tyrrell_McAllister 10 June 2010 10:25:09PM *  0 points [-]

incomplete knowledge about P_i can be accommodated without changing the formalism, by including all alternatives (completely described this time) enabled by available knowledge about the corresponding world programs, in the list {P_i}

How important is it that the list {P_i} be finite? If P_i is one of the programs in our initial list that we're uncertain about, couldn't there be infinitely many alternative programs P_i1, P_i2, . . . behind whatever we know about P_i?

I was thinking that incomplete knowledge about the P_i could be captured (within the formalism) with the mathematical intuition function. (Though it would then make less sense to call it a specifically mathematical intuition.)

Also note that in this post Wei Dai corrected the format of the decisions from individual input/output instances to global strategy-selection.

I've added a description of UDT1.1 to my pdf.

Comment author: Vladimir_Nesov 10 June 2010 10:48:16PM *  1 point [-]

In principle, it doesn't matter, because you can represent a countable list of programs as a single program that takes an extra parameter (but then you'll need to be more careful about the notion of "execution histories"), and more generally you can just include all possible programs in the list and express the level to which you care about the specific programs in the way mathematical intuition ranks their probability and the way utility function ranks their possible semantics.

On execution histories: note that a program is a nice finite inductive definition of how that program behaves, while it's unclear what an "execution history" is, since it's an infinite object and so it needs to be somehow finitely described. Also, if, as in the example above you have the world program taking parameters (e.g. a universal machine that takes a Goedel number of a world program as parameter), you'll have different executions depending on parameter. But if you see a program as a set of axioms for a logical theory defining the program's behavior, then execution histories can just be different sets of axioms defining program's behavior in a different way. These different sets of axioms could describe the same theories, or different theories, and can include specific facts about what happens during program execution on so and so parameters. Equivalence of such theories will depend on what you assume about the agent (i.e. if you add different assumptions about the agent to the theories, you get different theories, and so different equivalences), which is what mathematical intuition is trying to estimate.

Comment author: Vladimir_Nesov 10 June 2010 11:05:30PM *  0 points [-]

I've added a description of UDT1.1 to my pdf.

It's not accurate to describe strategies as mappings f: X->Y. A strategy can be interactive: it takes input, produces an output, and then environment can prepare another input depending on this output, and so on. Think normalization in lambda calculus. So, the agent's strategy is specified by a program, but generally speaking this program is untyped.

Let's assume that there is a single world program, as described here. Then, if A is the agent's program known to the agent, B is one possible strategy for that program, given in form of a program, X is the world program known to the agent, and Y is one of the possible world execution histories of X given that A behaves like B, again given in form of a program, then mathematical intuition M(B,Y) returns the probability that the statement (A~B => X~Y) is true, where A~B stands for "A behaves like B", and similarly for X and Y. (This taps into the ambient control analysis of decision theory.)

Comment author: Tyrrell_McAllister 10 June 2010 11:19:24PM *  1 point [-]

It's not accurate to describe strategies as mappings f: X->Y.

I'm following this paragraph from Wei Dai's post on UDT1.1:

[U]pon receiving input X, [the agent] would put that input aside and first iterate through all possible input/output mappings that it could implement and determine the logical consequence of choosing each one upon the executions of the world programs that it cares about. After determining the optimal S* that best satisfies its preferences, it then outputs S*(X).

So, "input/output mappings" is Wei Dai's language. Does he not mean mappings between the set of possible inputs and the set of possible outputs?

A strategy can be interactive: it takes input, produces an output, and then environment can prepare another input depending on this output, and so on.

It seems to me that this could be captured by the right function f: X -> Y. The set I of input-output mappings could be a big collections of GLUTs. Why wouldn't that suffice for Wei Dai's purposes?

ETA: And it feels weird typing out "Wei Dai" in full all the time. But the name looks like it might be Asian to me, so I don't know which part is the surname and which is the given name.

Comment author: Tyrrell_McAllister 10 June 2010 03:30:46PM 0 points [-]

What you described is not UDT, and not even a decision theory: say, what U(<E1,E2,...>) is for? It's not utility of agent's decision.

I gave an accurate definition of Wei Dai's utility function U. As you note, I did not say what U is for, because I was not giving a complete recapitulation of UDT. In particular, I did not imply that U(<E1,E2,...>) is the utility of the agent's decision.

(I understand that U(<E1,E2,...>) is the utility that the agent assigns to having program Pi undergo execution history Ei for all i. I understand that, here, Ei is a complete history of what the program Pi does. However, note that this does include the agent's chosen action if Pi calls the agent as a subroutine. But none of this was relevant to the point that I was making, which was to point out that my post only applies to UDT agents that use a particular kind of function U.)

(Although Wei Dai doesn't seem to consistently follow the distinction in terminology himself, it begins to matter when you try to express things formally.)

It's looking to me like I'm following one of Wei Dai's uses of the word "probability", and you're following another. You think that Wei Dai should abandon the use of his that I'm following. I am not seeing that this dispute is more than semantics at this point. That wasn't the case earlier, by the way, where I really did misunderstand where the probabilities of possible worlds show up in Wei Dai's formalism. I now maintain that these probabilities are the values I denoted by pr(Pi) when U has the form I describe in the footnote. Wei Dai is welcome to correct me if I'm wrong.

Comment author: Vladimir_Nesov 10 June 2010 09:09:43PM *  1 point [-]

I agree with this description now. I apologize for this instance and a couple others; stayed up too late last night, and negative impression about your post from the other mistakes primed me to see mistakes where everything is correct.

It was a little confusing, because the probabilities here have nothing to do with the probabilities supplied by mathematical intuition, while the probabilities of mathematical intuition are still in play. In UDT, different world-programs correspond to observational and indexical uncertainty, while different execution strategies to logical uncertainty about a specific world program. Only where there is essentially no indexical uncertainty, it makes sense to introduce probabilities of possible worlds, factorizing the probabilities otherwise supplied by mathematical intuition together with those describing logical uncertainty.

Comment author: Tyrrell_McAllister 10 June 2010 10:00:46PM *  0 points [-]

I agree with this description now. I apologize for this instance and a couple others; stayed up too late last night, and negative impression about your post from the other mistakes primed me to see mistakes where everything is correct.

Thanks for the apology. I accept responsibility for priming you with my other mistakes.

In UDT, different world-programs correspond to observational and indexical uncertainty, while different execution strategies to logical uncertainty about a specific world program. Only where there is essentially no indexical uncertainty, it makes sense to introduce probabilities of possible worlds, factorizing the probabilities otherwise supplied by mathematical intuition together with those describing logical uncertainty.

I hadn't thought about the connection to indexical uncertainty. That is food for thought.

Comment author: Vladimir_Nesov 10 June 2010 12:03:31AM *  0 points [-]

But P isn't controlled by the agent's decisions.

Very very wrong. The world program P (or what it does, anyway) is the only thing that's actually controlled in this control problem statement (more generally, a list <P1, P2, P3, ...> of programs, which could equivalently be represented by one program parametrized by an integer).

Edit: I misinterpreted the way Tyrrell used "P", correction here.

Comment author: Tyrrell_McAllister 10 June 2010 12:15:00AM *  0 points [-]

Very very wrong.

Here is the relevant portion of Wei Dai's post:

These considerations lead to the following design for the decision algorithm S. S is coded with a vector <P1, P2, P3, ...> of programs that it cares about, and a utility function on vectors of the form <E1, E2, E3, …> that defines its preferences on how those programs should run. When it receives an input X, it looks inside the programs P1, P2, P3, ..., and uses its "mathematical intuition" to form a probability distribution P_Y over the set of vectors <E1, E2, E3, …> for each choice of output string Y. Finally, it outputs a string Y* that maximizes the expected utility Sum P_Y(<E1, E2, E3, …>) U(<E1, E2, E3, …>). (This specifically assumes that expected utility maximization is the right way to deal with mathematical uncertainty. Consider it a temporary placeholder until that problem is solved. Also, I'm describing the algorithm as a brute force search for simplicity. In reality, you'd probably want it to do something cleverer to find the optimal Y* more quickly.)

If I am reading him correctly, he uses the letter "P" in two different ways. In one use, he writes Pi, where i is an integer, to denote a program. In the other use, he writes P_Y, where Y is an output vector, to denote a probability distribution.

I was referring to the second use.

Comment author: Vladimir_Nesov 10 June 2010 12:35:04AM 0 points [-]

Okay, the characterization of P_Y seems right. For my reaction I blame the prior.

Returning to the original argument,

the agent always cares about all possible worlds according to how probable those worlds seemed to the agent's builders when they wrote the agent's source code.

P_Y is not a description of probabilities of possible worlds conceived by agent's builder, it's something produced by "mathematical intuition module" for a given output Y (or, strategy Y if you incorporate the later patch to UDT).

Comment author: Tyrrell_McAllister 10 June 2010 12:44:58AM 0 points [-]

P_Y is not a description of probabilities of possible worlds conceived by agent's builder, it's something produced by "mathematical intuition module" for a given output Y (or, strategy Y if you incorporate the later patch to UDT).

You are right here. Like you, I misremembered Wei Dai's notation. See my last (I hope) edit to that comment.

I would appreciate it if you edited your comment where you say that I was "very very wrong" to say that P isn't controlled by the agent's decisions.

Comment author: Vladimir_Nesov 10 June 2010 12:50:17AM *  3 points [-]

It's easier to have a linear discussion, rather than trying to patch everything by reediting it from the start (just saying, you are doing this for the third time to that poor top-level comment). You've got something wrong, then I've got something wrong, the errors were corrected as the discussion developed, moving on. The history doesn't need to be corrected. (I insert corrections to comments this way, without breaking the sequence.)

Comment author: Tyrrell_McAllister 10 June 2010 12:52:58AM 0 points [-]

Thank you for the edit.

Comment author: Vladimir_Nesov 09 June 2010 11:40:58PM 0 points [-]

The second question (edited in later) is more pressing: you can't postulate fixed probabilities of possible worlds, how the agent controls these probabilities is essential.

Comment author: Tyrrell_McAllister 09 June 2010 11:49:53PM *  0 points [-]

The second question (edited in later) is more pressing

See my edit to my reply.