Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Model of unlosing agents

3 Post author: Stuart_Armstrong 02 August 2014 07:59AM

Some have expressed skepticism that "unlosing agents" can actually exist. So to provide an existence proof, here is a model of an unlosing agent. It's not a model you'd want to use constructively to build one, but it's sufficient for the existence result.

Let D be the set of all decisions the agent has made in the past, let U be the set of all utility functions that are compatible with those decisions, and let P be a "better than" relationship on the set of outcomes (possibly intransitive, dependent, incomplete, etc...).

By "utility functions that are compatible those decisions" I mean that an expected utility maximising agent with any u in U would reach the same decisions D as the agent actually did. Notice that U starts off infinitely large when D is empty; when the agent faces a new decision d, here is a decision criteria that leaves U non-empty:

  1. Restrict to the set of possible decision choices that would leave U non-empty. This is always possible, as any u in U would advocate for a particular decision choices du at d, and therefore choosing du would leave u in the updated U. Call this set compatible.
  2. Among those compatible choices, choose one that is the least incompatible with P, using some criteria (such as needing to do the least work to remove intransitivenesses and dependences and so on).
  3. Make that choice, and update P as in step 3, and update D and U (leaving U non-empty, as seen in step 1).
  4. Proceed.

That's the theory. In practice, we would want to restrict the utilities initially allowed into U to avoid really stupid utilities ("I like losing money to people called Rob at 15:46.34 every alternate Wednesday if the stock market is up; otherwise I don't.") When constructing the initial P and U, it could be a good start to be just looking at categories that humans natuarally express preferences between. But those are implementation details. And again, using this kind of explicit design violates the spirit of unlosing agents (unless the set U is defined in ways that are different from simply listing all u in U).

The proof that this agent is unlosing is that a) U will never be empty, and b) for any u in U, the agent will have behaved indistinguishably from a u-maximiser.

Comments (21)

Comment author: Wei_Dai 02 August 2014 05:11:04PM *  5 points [-]

Rationality should be about winning, not avoiding the appearance of losing. In order to avoid the appearance of losing, we just have to look consistent in our choices. But in order to win, we have to find the right utility function and maximize that one. How likely is it that we'll do that by prioritizing consistency with past choices above all else? (Which is clearly what this model is doing. It also seems to be the basic idea behind "unlosing agents", but I'm less sure about that, so correct me if I'm wrong.) It seems to me that consistency ought to naturally fall out of doing the right things to win. Inconsistency is certainly a sign that something is wrong, but there is no point in aiming for it directly like it's a good thing in itself.

Comment author: Stuart_Armstrong 02 August 2014 08:48:38PM 1 point [-]

(sorry for giving you another answer, but it seems it's useful to separate the points):

But in order to win, we have to find the right utility function and maximize that one.

The "right" utility function. Which we don't currently know. And yet we can make decisions until the day we get it, and still make ourselves unexploitable in the meantime. The first AIs may well be in a similar situation.

Comment author: Stuart_Armstrong 02 August 2014 08:33:47PM 0 points [-]

A second, maybe more pertinent point: unlosing agents are also unexploitable (maybe I should have called them that to begin with). This is a very useful thing for any agent to be, especially one who's values are not yet jelled (just as, eg, a FAI still in the process of estimating the CEV).

Comment author: Wei_Dai 08 August 2014 09:35:37PM 0 points [-]

I don't see how unlosing agents are compatible with CEV. Running the unlosing algorithm gives you one utility function at the end, and running CEV gives you another. They would be the same only by coincidence. If you start by giving control to the unlosing algorithm, why would it then hand over control to CEV or change its utility function to CEV's output (or not remove whatever hardwired switchover mechanism you might put in)? Your third comment seems make essentially the same point as your second (this) comment, and the same response seems to apply.

Comment author: Stuart_Armstrong 11 August 2014 10:10:38AM *  0 points [-]

Before running CEV, we are going to have to make a lot of decisions about what constitutes a human utility function, how to extract it, how to assess strength of opinions, how to resolve conflicts between different preferences in the same person, how to aggregate it, and so on. So the ultimate CEV is path dependent, dependent on the outcome of those choices.

Using an unlosing algorithm could be seen as starting on the path to CEV earlier, before humans have made all those decisions, and letting the algorithm solve make some of those decisions rather than ourselves. This could be useful if some of the components on the path to CEV are things where our meta-decision skills (for instructing the unlosing agent how to resolve these issues) are better than our object level decision skills (for resolving the issues directly).

Comment author: Stuart_Armstrong 02 August 2014 06:40:22PM 0 points [-]

If you're going to design an agent with an adaptable utility (eg a value loader) then something like an unlosing design seems indicated.

Comment author: Wei_Dai 08 August 2014 08:34:06PM 0 points [-]

You say "unlosing design seems indicated" for value loading, but I don't see how these two ideas would work together at all. Can you give some sort of proof of concept design? Also, as I mentioned before, value loading seems to be compatible with UDT. What advantage does a value loading, unlosing agent have, over a UDT-based value loader?

Comment author: Stuart_Armstrong 11 August 2014 10:27:56AM 0 points [-]

On a more philosophical note, we seem to have a different approach. It's my impression that you want to construct an idealised perfect system, and then find a way of applying it down to the real world. I seem to be coming up with tools that would allow people to take approximate "practical" ideas that have no idealised versions, and apply them in ways that are less likely to cause problems.

Would you say that is a fair assessment?

Comment author: Stuart_Armstrong 11 August 2014 10:22:48AM 0 points [-]

The reason an unlosing agent might be interesting is that it doesn't have to have its values specified as a collection of explicit utility functions. It could instead have some differently specified system that converges to explicit utility functions as it gets more morally relevant data. Then an unlosing procedure would keep it unexploitable during this process.

In practice, I think requiring a value-loading agent to be unlosing might be too much of a requirement, as it might lock in some early decisions. I see a "mainly unlosing" agent as being more interesting - say an imperfect value loading agent with some unlosing characteristics - as being potentially safer.

Comment author: Unnamed 02 August 2014 09:16:16PM 1 point [-]

The extrapolated volition of an unlosing agent is not well-defined, because the results of extrapolation are path-dependent.

Comment author: Stuart_Armstrong 04 August 2014 10:50:40AM 0 points [-]

Yes. But you can add criteria to get uniqueness, if that's particularly important.

Comment author: AlexMennen 02 August 2014 08:13:39PM 0 points [-]

I wasn't expressing skepticism that unlosing agents exist, only that they would be VNM-rational. Aside from the example I described in the linked comment about how such an agent could violate the independence axiom, it sounds like the agent could also violate transitivity. For example, suppose there are 3 outcomes A, B, and C, and that P says A>B, B>C, and C>A. If given a choice between A and B, the agent chooses A. If it is given an opportunity to switch to C after that, and then an opportunity to switch to B again after that, it will avoid getting stuck in a loop. But that doesn't remove the problem that, before any of that, it would pick A if offered a choice between A and B, B if offered a choice between B and C, and C if offered a choice between A and C. This still seems pretty bad, even though it doesn't get caught in dutch-book loops.

Comment author: Stuart_Armstrong 02 August 2014 08:44:46PM 0 points [-]

only that they would be VNM-rational

But if the agent can't be subject to Dutch books, what's the point of being VNM-rational? (in fact, in my construction, the agent need not be initially complete).

But the main point is that VNM-rational isn't clearly defined. Is it over all possible decisions, or just over decisions the agent actually faces? Given that rationality is often defined on Less Wrong in a very practical way (generalised "winning") I see no reason to need to assume the first. It weakens the arguments for VNM-rationality, makes it into a philosophical ideal rather than a practical tool.

And so while it's clear that an AI would want to make itself into an unlosing agent, it's less clear that it would want to make itself into an expected utility maximiser. In fact, it's very clear that in some cases it wouldn't: if it knew that outcomes A and B were impossible, and it currently didn't have preferences between them, then there is no reason it would ever bother to develop preferences there (baring social signalling and similar).

Comment author: Unnamed 02 August 2014 09:15:28PM 2 points [-]

Suppose you have A>B>C>A, with at least a $1 gap at each step of the preference ordering. Consider these 3 options:

Option 1: I randomly assign you to get A, B, or C
Option 2: I randomly assign you to get A, B, or C, then I give you the option of paying $1 to switch from A to C (or C to B, or B to A), and then I give you the option of paying $1 to switch again
Option 3: I take $2 from you and randomly assign you to get A, B, or C

Under standard utility theory Option 2 dominates Option 1, which in turn strictly dominates Option 3. But for an unlosing agent which initially has cyclic preferences, Option 2 winds up being equivalent to Option 3.

Comment author: Stuart_Armstrong 05 August 2014 09:30:00AM *  0 points [-]

Incidentally, if given the choice, the agent would choose option 1 over option 3. When making choices, unlosing agents are indistinguishable from vNM expected utility maximisers.

Or another way of seeing it, the unlosing agent could have three utility functions remaining: A>B>C, B>C>A, and C>A>B, and all of these would prefer option 1 to option 3.

What's more interesting about your example is that it shows that certain ways of breaking transitivities are better than others.

Comment author: Stuart_Armstrong 04 August 2014 12:07:12PM 0 points [-]

Which is a good argument to break circles early, rather than late.

Comment author: sebmathguy 03 August 2014 06:12:17AM *  1 point [-]

There's actually no need to settle for finite truncations of a decision agent. The unlosing decision function (on lotteries) can be defined in first-order logic, and your proof that there are finite approximations of a decision function is sufficient to use the compactness theorem to produce a full model.

Comment author: AlexMennen 03 August 2014 07:20:07PM 0 points [-]

But if the agent can't be subject to Dutch books

You avoid falling into Dutch book loops where you iterately pay to go around in a circle at each step, but you still fall into single-step Dutch books. Unnamed gave a good example.

Comment author: Stuart_Armstrong 04 August 2014 10:48:24AM 0 points [-]

Those aren't technically Dutch Books. And there's no reason a forward-looking unlosing agent couldn't break circles at the beginning rather than at the end.

Comment author: AlexMennen 04 August 2014 11:38:36PM *  0 points [-]

Those aren't technically Dutch Books.

Ok, but it is still an example of the agent choosing a lottery over a strictly better one.

And there's no reason a forward-looking unlosing agent couldn't break circles at the beginning rather than at the end.

Then it would be VNM-rational. Completeness is necessary to make sense as an agent, transitivity and independence are necessary to avoid making choices strictly dominated by other options, and the Archimedian axiom really isn't all that important.

Comment author: Stuart_Armstrong 05 August 2014 09:34:08AM 0 points [-]

Unnamed's example is interesting. But if given the choice, unlosing agents would chose option 1 over option 3 (when choosing, unlosing agents act as vNM maximisers).

Unnamed example points at something different, namely that certain ways of resolving intransitives are strictly better than others.