(Cross-posted from Hands and Cities)

Instrumental rationality is about achieving your goals. But morality, famously, sometimes demands that you don’t.

Suppose, for example, that you only want apples. Sometimes you might be in a position to steal apples, and get away with it, and know this. But morality still says: no.

So are morality and instrumental rationality in conflict? A classic answer is: yes, sometimes. But a classic hope is: maybe, on a sufficiently subtle picture of instrumental rationality, no?

This pair of posts examines such a hope. I focus on its expression in the work of the philosopher David Gauthier (and in particular, in his book Morals by Agreement). But I expect many of my comments to generalize.

Gauthier argues that rational agents should constrain their pursuit of their goals in order to avoid bad game-theoretic dynamics. Morality, he claims, is what falls out of such constraints. This post lays out the basics of Gauthier’s view. The next post evaluates it.

I. The problem

Gauthier begins where lots of people I know begin: with a picture of rational agency that takes an agent’s utility function as given and unquestionable, and directs that agent to maximize expected utility. That is, on Gauthier’s view, you aren’t allowed to tell an agent that their utility function isn’t nice and moral enough. You can’t say: “even if X in fact only cares about apples, she should care, intrinsically, about not stealing, about the well-being of the apple-owners, and so on — so stealing is irrational in this sense, even if it promotes her misguided goals.” Utility functions, for Gauthier, can be whatever. See my posts on “subjectivism”/”anti-realism” (1, 2, 3) for more on views in this vein.

In particular, utility functions can be totally indifferent to the utility functions (and general welfare) of others. Indeed, Gauthier proceeds with such asocial motivation as a backdrop assumption — not because he thinks humans are in fact this degree of asocial (though my sense is that he may be on the pessimistic end in this regard), but because he thinks morality should work even if they were.

In these respects, Gauthier’s view appeals to a certain kind of hard-nosed, pessimistic aesthetic. He begins in a universe free of “objective values,” populated, for all we know, by egoists, paperclip maximizers, dogs already eyeing the flesh of other dogs. Whatever the conventional coldness of these agents, he does not stand in judgment. Indeed, Gauthier need not, at any point, actually use normative language at all: he can simply predict what agents will in fact do, if they are good enough at getting what they want. And what they will do, he thinks, is adhere to a certain kind of morality.

How does this morality get going? Well, it’s that classic thing about prisoner’s dilemmas and the like: namely, sometimes (often?) interactions between classical utility maximizers suck for everyone. Or more precisely: if everyone makes decisions by holding the actions of other agents fixed, and then choosing the action with the highest expected value, often one ends up driven towards equilibrium outcomes (holding everyone else’s actions fixed, no one wants to change their own action) that are pareto inefficient (there is some alternative outcome that would leave some people with more utility, and no one with less).

Thus, to take a slightly more complicated version of the standard prisoner’s dilemma, suppose that you and I each face two buttons — “give” and “take” — and we each care solely about our own happiness, with strength linear in the amount of happiness in question. Both “take” buttons pay the presser one day of happiness (this happiness comes out of a separate pot — it’s not subtracted from the other player). My “give” button gives you two days of happiness, and your “give” button gives me three days. And suppose that we each can also play “mixed strategies” — that is, for any probability p, we can determine our action via a lottery with a p chance of giving, and 1-p chance of taking.

The space of possible outcomes here is as follows.

A bad situation for classical utility maximizers.

The only equilibrium outcome, here, is for both of us to take with 100% probability. That is, no matter what lottery you choose, and no matter what lottery I start out with, I always do better in expectation by increasing my own probability on taking — and the same holds for you. And once we’re both at 100% on taking, no one wants to change. So if we’re both classical utility maximizers, we each get one day.

But yet, as ever: this sucks. In particular: there are available outcomes where we both get more. If we both give, for example, I get three, and you get two. Heck, even if we both put 50% on give, I get two in expectation, and you get one and a half. That sweet path — up and to the right — towards the pareto frontier (i.e., the set of pareto-efficient outcomes) is open, and beckoning. We are both free agents, with full knowledge of the situation. Can’t we avoid burning value for both of us?

II. Bargaining

Suppose, for example, that we could sit down and talk beforehand, and if we can agree on a joint strategy — i.e., a set of strategies we will pursue in tandem — then we will both faithfully execute our part of the deal (more on faithfulness to deals later). If we don’t agree, though, we go and play the game solo, follow our local incentives relentlessly, and let the ravenous logic of classical, CDT-ish maximization destroy what we love. That is, absent agreement, we will both take. In this sense, for each of us, taking with 100% is the “Best Alternative to Negotiated Agreement” (BATNA). Because we can always just go take on our own, there’s no reason for either us to agree on an outcome that gets us less, in expectation than just taking would.

Granted, then, that each we need to get more than one day out of this, what outcome should we agree to jointly create? Trained on standard prisoner’s dilemmas, one’s first thought might be: “both should give with 100%.” But is that so clear? Perhaps, for example, you’d argue to me that give-give is unfair: you only get one extra day from it (on top of your BATNA), whereas I get two. Or maybe I smugly inform you that I already pre-committed to refusing all offers except 3.5-minus-epsilon for me, 1-plus-epsilon for you, which you, as a classical utility maximizer, are now rationally required to agree to. Oops, though: you tell me that you, too, had heard of the Ultimatum game, and you pre-committed to refusing all offers except 8/3-minus-epsilon for you, 1-plus-epsilon for me — at which point we both have a brief moment of “dear God, what have I done,” then the negotiation ends in a clap of thunder, and the skies turn red with blood.

Can we avoid such calamity? In game theory, this is a “bargaining problem”: loosely, a set of possible outcomes (including the outcomes resulting from all possible mixed strategies –this implies the space of outcomes is convex), together with a “disagreement point” (e.g., the outcome that will be chosen if negotiations break down) relative to which some available outcomes are better in expectation for all players.[1] A “bargaining solution” is a function from a bargaining problem to an “agreement point” — that is, an outcome rational, cooperating agents will jointly agree to pursue.

Now, as the pre-commitment shenanigans above suggest, it’s not clear that there always is a single outcome that rational agents will agree on in a bargaining problem. Bargaining, rather, might be a kind of wild west of rational choice: once certain constraints (like “do better than the disagreement point” and “reach the Pareto frontier”) have been met, rationality falls silent, and something more contingent and/or outside the scope of the problem set-up has to step in (thanks to Katja Grace for discussion).

But we might hope for, and seek out, more structure. For example: we can talk about the game theory involved in sequences of offers and counter-offers; and we can talk, as well, about axioms we might intuitively want a good bargaining solution to satisfy (see Osborne (2004), Chapter 16).

One famous axiomatic solution (I gather it also has good properties with respect to sequences of offers and counter-offers) is the Nash-bargaining solution, which directs rational agents to maximize the product of their respective gains relative to the disagreement point. This is the only solution that satisfies a set of four attractive axioms (see footnote).[2]

Gauthier, though, rejects the Nash solution in favor of what he calls “minimax relative concession.” A player’s “concession,” for Gauthier, is the percentage reduction in gains she incurs relative to her maximum gain, where her maximum gain is defined by the utility she gets on top of the disagreement point in the best-for-her candidate outcome (where a candidate outcome need to pareto-improve on the disagreement point — otherwise, someone would rule it out right off the bat). Gauthier directs bargainers to minimize the maximum concession amongst all the participants.

Given the standard definition of a bargaining problem, this ends up equivalent to the Kalai-Smorodinsky (KS) bargaining solution, which directs players to choose the pareto-efficient outcome that maintains the ratio between their maximum gains (maintaining this ratio implies equal Gauthier-style concession, and if a Pareto-efficient outcome with equal concessions is available, as it will be in a standard bargaining problem, this will also be the outcome that minimizes the maximum concession — see Gauthier (1987), p. 140, for more). Interpreted geometrically: in both cases, we draw a box defined by the maximum gains available to the players, and a line from the disagreement point to the top-right corner of the box, and then choose the pareto-efficient outcome that falls on that line:

The KS solution in action.

Thus, in the prisoner’s dilemma above, Gauthier and KS choose the outcome (8/3, 19/9) — that is, the outcome that gives me 5/3 in expectation on top of my BATNA, and you 10/9, vs. my max gain of 5/2, and your max gain of 5/3 (a 33% “relative concession” for each of us). This corresponds to me playing 100% give, but you playing only 83%.[3]

There are further bargaining solutions available as well: see e.g. here for a nice chart. Because Gauthier focuses on the KS solution, I will in what follows as well. However, I expect that most of his claims, and most my responses, would apply if we instead chose a different solution. The important thing is for agreement to make it possible for rational agents to reach the pareto frontier, even in cases where classical utility maximizing would not.

III. Constrained maximizers

Suppose, then, that by negotiating in a back room before we enter our prisoner’s dilemma, we manage to agree on a joint strategy that gets us to the pareto frontier. Now, though, we have a new problem: we’re both still classical utility maximizers, and classical utility maximizers don’t keep agreements like this — or at least, not in thought experiments (the real world implies additional incentives, repeated games, distinctions between decision procedures and criteria of rightness, and so on — I discuss this in my next post). After all, once you leave the back room and sit down in front of your buttons, you face the same incentives, and the same crappy equilibrium, that you did before negotiation. Regardless of whether I keep my end of the bargain bargain, you’ll get more utility (won’t you?) if you go 100% on take. And you’re all about getting more utility. What else is there to get? Why would you choose to get less? You reach for the “take” button; the world howls in pain and grief; you shrug, and maximize.

And this is where Gauthier says: screw that. We need a new way. We need a new type of agent. An agent able, when necessary (and suitably reciprocated), to actually not maximize utility. An agent who plays nice with others whose values she is genuinely indifferent to; who doesn’t burn the commons even for a righteous cause; an agent who spits in the face of Moloch, and cooperates.

Gauthier calls such agents “constrained maximizers.” And we should be clear about what the constraints in question imply. We are not talking, here, about agents who are suitably uncertain about who will find out, or about whether they will get punished; who understand that many opportunities for cooperation are many-shot rather than one-shot; who acts with their reputations in mind; who think about the social knock-on effects of their norm violation, and so on. That’s easy stuff — at least in theory. As Gauthier puts it, “such a person exhibits no real constraint” (p. 120).

Nor, even, are we talking about EDT agents, who cooperate in prisoner’s dilemmas when doing so increases their credence that the other person cooperates. This gets you some of the cooperation Gauthier wants, but not all. Suppose, for example, that in the case above, I will have to choose first, and you — an EDT agent — will get to see what I have chosen before you yourself choose. Again we get to negotiate in the back room; and again we manage to reach an agreement that gets us to pareto frontier: I will give with 100% probability, and you, with 83%. We shake on it, looking each other in the eyes as we do.

But if I look deep enough into your eyes, what will I see? I will see you returning to your buttons, receiving the news that I have kept my part of the bargain, and then updating your world-model. Now, when you condition on your defection vs. cooperation, your credence in my cooperating remains unchanged: your pockets stay just as full with my giving, and they get fuller the more likely you are to take. So, you defect, and take with 100%. So, I see this in your eyes (I discuss deception in the next post). So, unless we can find some external incentive to bind you, the deal is off. If I have to move first, it’s no use negotiating with maximizers like you.

No, Gauthier’s constrained maximizers are stranger, even, than the acausal strangeness of EDT. Somehow (is it so strange? is it, maybe, extremely normal?), they cooperate even when the other person’s cooperation is already in the bag. They exhibit what Gauthier would call “real constraint.” They dance to some genuinely different drum. In these respects, they are more like the “updateless” agents I discuss at the end of my post on decision theory. Indeed, Gauthier and the updateless-ish decision theorists share many key concerns — and perhaps, some of the same problems (see next post).

Note, though, that constrained maximizers aren’t suckers, either. Indeed, it’s important to Gauthier that they remain ready to maximize the heck out of their individual utility, in perfectly classical and Molochian ways, if the other players aren’t up for playing nice. More precisely, Gauthier (p. 167) defines a constrained maximizer as someone who bases her actions on a “joint strategy” if:

  1. everyone following this strategy would be better for her than everyone following individual strategies (e.g., the joint strategy is “beneficial”), and about as good for her as the strategy that would be chosen by minimax concession (e.g., the joint strategy is “fair”), and
  2. she actually expects her following this strategy to result in her getting more utility than she would if everyone followed individual strategies.

2, here, is supposed to rule out getting exploited in interaction with unconstrained maximizers. Thus, in a prisoner’s dilemma with an unconstrained maximizer, I, as a constrained maximizer, may recognize that my giving with 100%, and your with 83%, would be a fair and beneficial outcome, yielding me 7/2 days of joy in expectation. But I also know that you’re an unconstrained maximizer, who will always take with 100%: so I don’t, actually, expect to get 7/2 out giving. Rather, I expect to get 0, which is worse than what I would get if we both just take. So, shaking my head at the value your unconstrained maximization is burning for both of us, I pocket my one day, leave you to the harsh winds of the state of nature, and go to seek better company.

We can quibble in various ways with the specific definition of constrained maximization that Gauthier offers (see footnote for a few of my objections).[4] Even if we don’t like Gauthier’s formulation, though, the outlines of the project he’s groping at are fairly clear. He wants to identify a way of being, such that when suitably discerning agents who adopt this way of being meet in a one shot prisoner’s dilemma (or some other similar case), they walk out with a roughly pareto-optimal and intuitively “fair” outcome. He wants such agents to notice and defect on defectors. He wants to claim that their way of being is compatible with instrumental rationality (despite the fact that their whole shtick is passing up opportunities to maximize utility). And he wants to call the difference between their way of being and the way of an unconstrained maximizer “morality.”

Does this project succeed? I turn to that question in my next post.


  1. See Osborne (2004), p. 482, for a more fleshed-out formal definition. ↩︎

  2. The axioms in question are: 1. Pareto-efficiency (you can’t do better for someone without being worse for someone else). 2. Symmetry (if the space of outcomes is symmetric — e.g., for every outcome (u1, u2), (u2, u1) is also available — and the disagreement point gives equal utility to both players, then the chosen point should give equal utility to both players; I think of this as “if the space of outcomes doesn’t let you tell which player is on which axis, it shouldn’t matter which axis a player ends up on”). 3. Representation invariance (multiplying utility functions by a constant, and/or adding a constant to them, shouldn’t matter; this follows from the standard understanding of utility functions as insensitive to affine transformations). 4. Independence of irrelevant alternatives (if you shrink the set of available outcomes, but leave the old agreement point available, the new agreement point should be the same as the old). ↩︎

  3. The KS solution rejects the “independence of irrelevant alternatives” premise that the Nash solution accepts. Unlike the Nash solution, though, the KS solution satisfies a further axiom, called the “monotonicity axiom,” which requires that if, for every possible utility level the other player can get, the maximum feasible level I can get increases, then the agreement point should improve by my lights. Indeed, KS is the only solution compatible with monotonicity and the other axioms the Nash solution is based on (pareto-efficiency, symmetry, and representation invariance). ↩︎

  4. Quibbles salient to me include: 1. Gauthier’s formulation above doesn’t require that constrained maximizers actually expect to get a “fair” level of utility out of cooperation — only that the joint strategy that they follow would be fair if everyone followed it. Thus, for example, if you pre-commit to “giving” with only 40% probability, thereby yielding me a bit above one day in expectation, then according to conditions (a) and (b), I should still act on the basis of a joint strategy where you give with 83%, and I give 100% — even though you, by hypothesis, won’t. 2. Gauthier’s formulation doesn’t allow constrained maximizers to take advantage of suckers — even though the vibe of his overall project suggests that they should. Thus, if you’re an “I give with 100% no matter how mean I expect you to be” bot, or an “I assume everyone will always to be super nice” bot, Gauthier’s constrained maximizer still cooperates with you with 100%. This fits poorly with the rationale he offers for becoming a constrained maximizer in the first place (see next post for more). 3. Later in the book, Gauthier defines “fair and beneficial” relative to a disagreement point other than what happens if everyone just acts as an unconstrained maximizer. I’m skeptical of his rationale for this move (again, more in the next post), but regardless, it isn’t reflected in the definition above. (Gauthier’s formulation also doesn’t tell you which fair and beneficial joint strategy to pursue — and we can imagine cases where agents pursuing incompatible fair and beneficial joint strategies leads to disaster. But this feels like a different type of problem.) ↩︎

New Comment
5 comments, sorted by Click to highlight new comments since:

Your morality is a part of your goals, what your instrumental rationality is pursuing. Therefore there is no conflict between them.

The rest of the post seems to be about game-theoretic issues, and I don't see a connection to morality there.

[-]TAG80

Your morality is which part of your goals? If there is no criterion distinguishing moral goals from non moral ones , then a society of selfish jerks who always defect would be 100% moral. But if morality is related to unselfish, cooperative behaviour, as most people believe, game theory is potentially relevant.

(There's a posting where Yudkowsky kind-of-sort argues for the morality-is-goals theory, and a subsequent one where he notices the problem and starts talking about non-jerkish values).

The selfish jerks would not be rational, as they wouldn't be winning. That's what the game theory is about.

The game theory is independent of morality. In some such games, winning happens to involve being good to your neighbours. But in others, winning may involve doing evil to your neighbours. It would be nice if the morally best action ("What is hateful to you, do not do to your neighbour") were always to be the selfishly winningest one, but while the examples have this property, I do not think it has been established in general.

[-]TAG10

The selfish jerks would not be rational, as they wouldn’t be winning

They would be doing as well as you can if you refuse to cooperate. More importantly, co operation isn't a Pareto improvement on defection, because some cooperators take a hit, because not everything is a PD.

But in others, winning may involve doing evil to your neighbours. It would be nice if the morally best action (“What is hateful to you, do not do to your neighbour”) were always to be the selfishly winningest one,

It isn't. It's not a Pareto improvement.

Playing co-operatively can "grow the pie" or produce more overall value even if generating individual losers

The way you operate, among other things, effects how things would turn out if you were doing something like playing against copies of yourself. 'Morally' (depending on what you think is moral) can perhaps be described a subset of ways of operating. Less 'a connection to morality' and more 'does morality have certain optimality* properties?', focusing on figuring out said properties and then drawing some comparisons to morality.

*Hence the game-theoretic issues.