Looking forward to reading this properly. For now I'll just note that Roger Crisp attributes LELO to C.I. Lewis.
Three articles, but the last is most relevant to you:
Another related, much older reference is from Ramsey's Truth and Probability (1926) in which he relates risk attitudes to preferences over repeated experiences (it's in the single person case however):
"We can put this in a different way. Suppose his degree of belief in is ; then his action is such as he would choose it to be if he had to repeat it exactly times, in of which was true, and in the others false. [Here it may be necessary to suppose that in each of the times he had no memory of the previous ones.]"
That's a daunting amount of formalization. I hope all this effort helps with aggregation paradoxes involving the creation of agents, i.e. with variants of the repugnant conclusion, to which you allude in the beginning. I guess we will see in your next post.
May I also suggest to share or crosspost this on the EA Forum, where problems in population ethics are discussed more frequently?
I'm not sure we should worry about generalizing with high-powered machinery. Pick a collection, and you can represent it with a monad. But pick a monad, and it's probably not a collection (I think?).
E.g. consider a "whoops I lost some of them monad" - for each set you choose some subset, plus an extra element (could call it {*}, as in the maybe monad). So if my original set is (1,2,3,4,5,6,7), there will be some whoops monad that maps this to (1,2,*). Functions work as normal except when they would involve * or lost elements, in which case they get mapped to *. Seems like a perfectly good monad, but it's the diametric opposite of a collection.
sorry i’m not getting this whoops monad. can you spell out the details, or pick a more standard example to illustrate your point?
i think “every monad formalises a different notion of collection” is a bit strong. for example, the free vector space monad (see section 3.2) — is a collection of the elements, for some notion of collection?
is every element of a free algebraic structure a “collection” of the generators? would you hear someone say that a quantum state is a collection of eigenstates? at a stretch maybe.
The identity monad probably works about as well as an illustration, but has less of the flavor of "not only did you not make this more like a collection, you made it worse" :P But advantage is you didn't need the axiom of choice to specify it.
note that there are only two exceptions to the claim “the unit of a monad is componentwise injective”. this means (except these two weird exceptions), that the singleton collections and are always distinct for . hence, , the set of collections over , always “contains” the underlying set . by “contains” i mean there is a canonical injection , i.e. in the same way the real numbers contains the rational .
in particular, i think this should settle the worry that “there should be more collections than singleton elements”. is that your worry?
I wouldn't say it's my worry exactly, but it does deal with the most forceful reasons for worrying, yeah.
1. Introduction
1.1. Three aggregative principles
This article examines aggregative principles of social justice. These principles state that a social planner should make decisions as if they will face the aggregated personal outcomes of every individual in the population. Different conceptions of aggregation generate different aggregative principles.
Aggregative principles avoid many theoretical pitfalls of utilitarian principles. Unlike utilitarianism, aggregative principles do not require specifying a social welfare function, which is notoriously intractable. Moreover, they seem less prone to counterintuitive conclusions such as the repugnant conclusion or the violation of moral side constraints.[1]
There are three well-known aggregative principles:
By the end of this article, we will see that these three aggregative principles are instances of an vast family of similar principles.
1.2. Living Every Life Once
The idea, as articulated below by William MacAskill, is that a social planner should make decisions as if they will live out every individual's life (past, present, and future) in sequence. We will call this principle of social justice "Live Every Life Once" (LELO).[2]
MacAskill's hope is that the social planner, following LELO, would choose policies benefiting each individual because they anticipate living each individual's life, and they would avoid policies harming any individual for the same reason. For example, the social planner wouldn't choose to emit dangerous pollution that will harm the health of future generations, because the social planner anticipates suffering the consequences themselves, although delayed by a many millennia.
MacAskill's thought experiment bares a striking similarity to two other thought experiments in social ethics — namely, Harsanyi's Lottery and Rawls' Original Position.
1.3. Harsanyi's Lottery
The economist John C. Harsanyi offers a different principle of social justice: a social planner should make decisions as if they faced a hypothetical lottery over the personal outcomes of each individual in society. This lottery would assign a likelihood to each individual, so the social planner wouldn't be sure which individual's life they will face. For example, they may face a 20% chance of being individual A, a 35% chance of being B, and so on. The ignorance is meant to force an impartial perspective for making decisions, a feature of social justice. We will call this principle of social justice "Harsanyi's Lottery" (HL).[3]
Harsanyi's hope is that the social planner, following HL, would choose policies benefiting each individual because there is some nonzero probability that they face that individual's life. They would also avoid policies harming any individual for the same reason. For instance, they would not choose to impoverish the majority of society for a small gain to a minority, because the expected value of the corresponding lottery of outcomes is negative.
Typically, the hypothetical lottery is taken to be uniform over all individuals in society. This uniformity assumption is crucial for ensuring impartiality: the social planner would not rationally prioritize any one individual over another if they have an equal probability of being each person.
1.4. Rawls' Original Position
The philosopher John Rawls' offers a third principle of social justice, similar to Harsanyi's Lottery.[4] His principle states that a social planner should make decisions as if they were ignorant about which individual in society they will be. We will call this principle of social justice "Rawls' Original Position" (ROI).[5]
Rawls' hope is that the social planner, following ROI, would choose policies benefiting each individual because they must consider the possibility that they could be any individual. They would also avoid policies harming any individual for the same reason. For example, the social planner wouldn't choose to torture someone, even to greatly benefit the rest of society, because the social planner must consider the possibility that they will end up being that person.
HL and ROI share obvious similarities: both principles ask the social planner to imagine themselves in a state of ignorance about which individual's personal outcome they will face. However, they understand this ignorance in different ways. Under HL, the ignorance is probabilistic, with likelihoods attached to the alternatives. By contrast, under ROI, the ignorance is possibilistic, meaning the planner considers it possible that they could be any individual, without assigning probabilities to those possibilities. This situation (i.e. having no basis for assigning probabilities to the possible alternatives) is sometimes called Knightian uncertainty. Moreover, note that HL proposes a physical mechanism by which the individual is selected, namely, a random lottery. By contrast, ROI merely states that each individual might be selected, without specifying any physical mechanism.
1.5. Structural similarities
The similarity between HL and ROI was apparent to Harsanyi and Rawls.[6] On the other hand, HL and ROI seem, at first glance, quite distinct from LELO. Firstly, Harsanyi's and Rawls' principles both begin with a planner in a state of ignorance about which individual's personal outcome they will face, whereas LELO posits no such uncertainty: so long as the planner knows the personal outcomes of each individual and the ordering of the individual's births, then their hypothetical fate is certain. Moreover, LELO asks the social planner to contemplate an abnormal prospect, i.e. a lifetime spanning millennia, whereas HL and ROI involve prospects that actual individuals in society will face.
However, these three principles are structurally similar: LELO, HL and ROI each involve aggregating the prospects faced by the individuals into a single hypothetical prospect faced by the social planner. They differ in the mode of aggregation they employ: LELO aggregates via a concatenation, HL via a lottery, and ROI via a disjunction. This common aggregative structure appears to be underexplored in the existing literature.
I will call principles of this general form — defining social justice in terms of an aggregation of individual prospects — aggregative principles of social justice. LELO, HL, and ROI are three examples, but they do not exhaust the space of aggregative principles. In fact, for any well-defined mode of aggregation, we can generate a corresponding aggregative principle. The space of aggregative principles is large and underexplored.
The rest of the article is organized as follows. Section 2 formalises LELO, HL, and ROI in parallel, highlighting the structural similarity. Section 3 formalises the informal notion of a "mode of aggregation" with the mathematical concept of monads, and presents a full characterization of the space of aggregative principles. This is the key contribution of the article. Section 4 explores examples of the algebraic structures on personal outcomes that are necessary for the aggregative principles to be well-defined.
2. Formalising LELO, HL, and ROI
2.1. Personal and social outcomes
Each of LELO, HL, and ROI attempt to extend the planner's self-interested attitudes towards personal outcomes to moral attitudes towards social outcomes. They achieve this by assigning to each social outcome s a hypothetical personal outcome p, and then stating that the social planner should treat s as they would treat p. In other words, a social outcome s is deemed socially desirable if the corresponding personal outcome p is personally desirable.
Let P be the space of personal outcomes and S be the space of social outcomes. By "personal outcome", I mean a full description of the state-of-affairs for a single individual, and by "social outcome", I mean a full description of the state-of-affairs for society as a whole. Each aforementioned principle of social justice proposes a function ζ:S→P assigning to each social outcome s∈S a hypothetical personal outcome ζ(s)∈P. However, they differ on the function they propose:
LELO uses the function ζLELO:S→P where ζLELO(s)∈P is the personal outcome of facing a concatenation of the lives of the individuals facing social outcome s∈S. For instance, if s consists of three individuals facing personal outcomes p1, p2, and p3 then ζLELO(s) is the personal outcome of first facing p1, then p2, then p3 in sequence. We'll denote this outcome by ζLELO(s)=p1▹p2▹p3.
HL uses the function ζHL:S→P where ζHL(s)∈P is the personal outcome of facing a lottery among the personal outcomes of the individuals in social outcome s∈S. For instance, if s consists of three individuals facing personal outcomes p1, p2, and p3 then ζHL(s) is the personal outcome of facing each outcome pi with equal likelihood 13. We'll denote this outcome by ζHL(s)=⟨p1:13∣p2:13∣p3:13⟩.
ROI uses the function ζROI:S→P where ζROI(s)∈P is the personal outcome of possibly facing the personal outcome of any individual in social outcome s. For instance, if s consists of three individuals facing personal outcomes p1, p2, and p3 respectively, then ζROI(s) is the personal outcome of facing either p1, p2, or p3, but without any probabilities attached to these possibilities. We'll denote this outcome by ζROI=p1⊕p2⊕p3.
It remains to define these three functions, ζLELO, ζHL, and ζROI. For simplicity, let's assume that all social outcomes share a fixed, finite population of individuals, represented by the set I={i1,…,in}. This assumption could be relaxed in future work, to handle populations that vary across social outcomes. Moreover, let's assume that the personal outcome for each individual i∈I is fully determined by the social outcome. Formally, there exists a function γ:I×S→P such that, if the social outcome s∈S obtains, then each individual i∈I faces the personal outcome γ(i,s)∈P. We will treat this as a global assumption, not localised to any particular principle of social justice.
For example, S might be the set of all possible physical configurations of the universe across time, while P might be the set of an individual's possible health and economic outcomes. However, the precise definitions of S and P are not crucial for the present analysis. Conceptually, P represents the domain of personal, self-interested preferences, while S represents the domain over which we seek to define social or ethical preferences. In Section 4, we will give concrete examples of these two spaces.
If P is already well-understood, there is a simple way to define S and γ. We could define S=PI to be the space of functions from individuals to personal outcomes, and let γ:I×PI→P be the standard evaluation function, mapping the pair (i,f) to f(i). Intuitively, a social outcome is just a vector specifying each individual's personal outcome, and γ simply looks up individual i's outcome in this vector. However, I have opted for a more general presentation in which social outcomes are not entirely characterized by the personal outcomes of individuals. This allows for the possibility that S contains information beyond just the vector of personal outcomes. Consequently, there may exist distinct social outcomes s,s′∈S such that γ(i,s)=γ(i,s′) for all i∈I.
If we are provided with the function γ:I×S→P, how might we construct the target function ζ:S→P? As we will see, the key to doing this is to assume some additional algebraic structure on the space P, beyond it just being an abstract set. I will explain how this construction occurs in each aggregative principle, starting with Live Every Life Once.
2.1. Formalising LELO
Informally, this whole procedure can be summarized as follows: (1) The population is represented by a list of individuals. (2) Each social outcome provides a function from individuals to their personal outcomes. This function can be lifted to a function from lists of individuals to the corresponding lists of personal outcomes, and then applied to the list representing the population. Hence each social outcome provides a list of personal outcomes. (3) Any list of personal outcomes can be concatenated into a single personal outcome. (4) Therefore, each social outcome provides a single personal outcome.
I'll now spell out the details.
(1) For any set X, a list over X is a finite sequence [x1,…,xn] where all entries x1,…,xn are elements of X. The set of all lists over X is denoted by List(X). This includes the empty list, denoted by [], or lists with repeated entries. Note that a list is more than just a set, it also imposes an ordering on the individuals.[7]
LELO must assume that the population is represented by a distinguished list of individuals l∈List(I). Typically, this list consists of all humans ordered by their birth, although alternative orderings could be considered.
(2) As discussed before, there exists a function γ:I×S→P such that, if the social outcome s∈S obtains, then each individual i∈I faces the personal outcome γ(i,s)∈P. It follows that each social outcome s∈S provides a function γ(−,s):I→P from individuals to their personal outcomes, where γ(−,s) denotes the function i↦γ(i,s).
Now, any function f:I→P from individuals to their personal outcomes can be lifted to a function fList:List(I)→List(P) from lists of individuals to the corresponding lists of personal outcomes. Concretely, fList sends a list [i1,…,in] to the list [f(i1),…,f(in)], by applying f componentwise. This lifting operation is a general feature of lists.
Hence, each social outcome s provides a list of personal outcomes γ(−,s)List(π), obtained by lifting γ(−,s):I→P to a function γ(−,s)List:List(I)→List(P) and then applying to the distinguished list of individuals l∈List(I).
(3) LELO assumes that any list of personal outcomes can be concatenated into a single personal outcome. Formally, there exists a function conc:List(P)→P which reduces any list of personal outcomes [p1,…,pn]∈List(P) into a single personal outcome conc([p1,…,pn])∈P.
To align with MacAskill's intended interpretation, we should view conc([p1,…,pn]) as the personal outcome of facing each pi in order, starting with p1 and ending with pn. Perhaps after each life pi ends, one is instantaneously transported to the beginning of the life pi+1 with one's memories of the proceeding life wiped. The process of living a life, dying, memory wiping, and moving to the next life is repeated until the full list of outcomes is exhausted.
It is worth noting that the concatenation operator conc:List(P)→P can equivalently be presented by a binary operator ▹:P×P→P and a constant element ϵ∈P, provided ▹ and ϵ satisfy the monoid axioms of associativity and identity.[8] Specifically, given conc, define ▹ as p▹p′:=conc([p,p′]) and define ϵ as ϵ:=conc([]). Conversely, given ▹ and ϵ, define conc as conc([p1,…,pn]):=ϵ▹p1▹p2▹⋯▹pn, evaluating the products left-to-right.
The monoid axioms are:
(4) Putting it all together, each social outcome s provides a single personal outcome, namely ζLELO(s):=conc(γ(−,s)List(l))∈P, obtained by applying the concatenation operator conc:List(P)→P to the list of personal outcomes γ(−,s)List(l)∈List(P) generated by s. This defines the LELO aggregation function ζLELO:S→P, which assigns to each social outcome the concatenated personal outcome.
To illustrate, suppose the population I={i1,…,in} is represented by the list l=[i1,…,in], and suppose the social outcome s assigns personal outcome pk to individual ik for each k, i.e. γ(ik,s)=pk. Then the concatenated personal outcome is ζLELO(s)=p1▹⋯▹pn. As a sanity check, consider the trivial case where I={i} consists of a single individual. Here the population list is just [i]∈List(I), and the concatenated outcome is simply ζLELO(s)=γ(i,s), i.e. the personal outcome assigned to the sole individual i.
The ordering of the distinguished list affects the structure of the concatenated outcome, due to the non-commutativity of the binary concatenation operator ▹. In general, p▹p′ and p′▹p yield different personal outcomes. The choice of ordering has substantive implications for the resulting principle of social justice. For example, suppose the social planner has a positive rate of time preference, i.e. they discount the value of future experiences. This is a realistic assumption about human preferences. A LELO principle using a chronological ordering of individuals (from earliest-born to latest-born) will prioritize the interests of earlier generations compared to a principle using the reverse-chronological ordering, all else being equal. More formally, suppose the social planner has a utility function u:P→R over personal outcomes and a discount factor β>0. Then the utility of a concatenated outcome p1▹p2 is given by u(p1▹p2)=u(p1)+βduration(p1)⋅u(p2) where duration:P→R≥0 maps each personal outcome to its duration. This discounting formula places more weight on the first outcome p1 than the second outcome p2, and the difference grows exponentially with the duration of p1. Thus, the social planner's time preferences, combined with the ordering of the list, can lead to a "tyranny of the earlier" in the resulting principle of social justice.
Next, I will turn to Harsanyi's Lottery, the earliest of the three aggregative principles of social justice.
2.2. Formalising HL
The procedure is similar to LELO: (1) The population is represented by a distribution of individuals. (2) Each social outcome provides a function from individuals to their personal outcomes. This function can be lifted to a function from distributions of individuals to the corresponding distributions of personal outcomes, and then applied to the distribution representing the population. Hence each social outcome provides a distribution of personal outcomes. (3) Any distribution of personal outcomes can be interpolated into a single personal outcome. (4) Therefore, each social outcome provides a single personal outcome.
I'll now spell out the details.
(1) For any set X, a distribution over X is a function π:X→[0,1] such that the support set supp(π):={x∈X:π(x)>0} is finite and ∑x∈Xπ(x)=1. We will sometimes use the notation ⟨x1:λ1∣⋯∣xn:λn⟩ to denote a distribution π:X→[0,1] satisfying π(x)=∑k:xk=xλk for each x∈X. For example, ⟨x:0.1∣x:0.3∣y:0.6⟩ and ⟨x:0.4∣y:0.6⟩ denote the same distribution π:X→[0,1] satisfying π(x)=0.4 and π(y)=0.6.
The set of all distributions over X is denoted by Δ(X). This includes the point-mass distributions, denoted by ⟨x:1⟩ for each x∈X, or uniform distributions ⟨x1:1n∣…∣xn:1n⟩. Note that a distribution is more than just a set, it also imposes a weighting on the individuals.
HL must assume that the population is represented by a distinguished distribution over individuals π∈Δ(I). Typically, the distinguished distribution is taken to be uniform over the entire population. That is, if there are n individuals in total, then π=⟨i1:1n∣⋯∣in:1n⟩. However, alternative weightings could be considered.
(2) As with LELO, there exists a function γ:I×S→P such that, if the social outcome s∈S obtains, then each individual i∈I faces the personal outcome γ(i,s)∈P. It follows that each social outcome s∈S provides a function γ(−,s):I→P from individuals to their personal outcomes, where γ(−,s) denotes the function i↦γ(i,s).
Now, any function f:I→P from individuals to their personal outcomes can be lifted to a function fΔ:Δ(I)→Δ(P) from distributions of individuals to the corresponding distributions of personal outcomes. Concretely, fΔ sends a distribution π=⟨i1:λ1∣⋯∣in:λn⟩ to the distribution ρ=⟨f(i1):λ1∣⋯∣f(in):λn⟩. This lifting operation is a general feature of distributions.
Hence, each social outcome s provides a distribution of personal outcomes γ(−,s)Δ(π), obtained by lifting γ(−,s):I→P to a function γ(−,s)Δ:Δ(I)→Δ(P) and then applying to the distinguished distribution of individuals π∈Δ(I).
(3) HL assumes that any distribution of personal outcomes can be interpolated into a single personal outcome. Formally, there exists a function E:Δ(P)→P which reduces any distribution of personal outcomes ρ=⟨p1:λ1∣⋯∣pn:λn⟩∈Δ(P) into a single personal outcome E[ρ]∈P.
To align with Harsanyi's intended interpretation, we should view E[ρ] as the personal outcome of facing each pi∈P with probability λi. Perhaps a random outcome pi is sampled according to the distribution ρ and the individual then faces that outcome. In contrast with LELO, the individual ultimately faces only a single human lifetime.
It is worth noting that the interpolation operator E:Δ(P)→P can equivalently be presented by a family of binary operators +λ:P×P→P, one for each λ∈(0,1), provided +λ satisfies the convex space axioms of idempotence, skew-commutativity, and skew-associativity.[9] Specifically, given E, define +λ as p+λp′:=E[⟨p:λ∣p′:1−λ⟩]. Conversely, given the family of operators {+λ}λ∈(0,1), we can rather clumsily define E by induction on n:
E[⟨p1:λ1∣⋯∣pn:λn⟩]:=E[⟨p1:λ11−λn∣⋯∣pn−1:λn−11−λn⟩]+1−λnpn
The convex space axioms are:
(4) Putting it all together, each social outcome s provides a single personal outcome, namely ζHL(s):=E(γ(−,s)Δ(π))∈P, obtained by applying the interpolation operator E:Δ(P)→P to the distribution of personal outcomes γ(−,s)Δ(π)∈Δ(P) generated by s. This defines the HL aggregation function ζHL:S→P, which assigns to each social outcome the interpolated personal outcome.
To illustrate, suppose the population I=i1,…,in is represented by the uniform distribution π=⟨i1:1n∣⋯∣in:1n⟩. Suppose further that the social outcome s assigns personal outcome pk to individual ik for each k, i.e. γ(ik,s)=pk. Then the interpolated personal outcome is ζHL(s)=E[⟨p1:1n∣⋯∣pn:1n⟩]. As a sanity check, consider the trivial case where I=i consists of a single individual. Here the initial distribution is the point mass ⟨i:1⟩∈Δ(I) and the interpolated outcome is simply ζHL(s)=γ(i,s), i.e. the personal outcome assigned to the sole individual i.
The weighting of the distinguished distribution affects the structure of the interpolated outcome, because p+λp′ and p+λ′p′ typically yield different personal outcomes when λ≠λ′. The choice of weighting has substantive implications for the resulting principle of social justice. If the HL principle uses a non-uniform distribution then the social planner with prioritize the individuals who are assigned a greater weighting, and this favoritism towards the higher-weighted group grows as the weighting distribution becomes more uneven.
Suppose there is a fixed amount of resources R>0 to be distributed among the population. Furthermore, suppose the resource yields diminishing marginal returns, i.e. the social planner's utility function over resources u:R≥0→R is strictly concave. This is a realistic assumption about human preferences. Following HL, the social planner will allocate resources to maximise the expected value of the corresponding lottery. Formally, the social planner chooses (r1,…,rn) to maximize λ1⋅u(r1)+⋯+λn⋅u(rn) subject to the constraint r1+⋯+rn=R. In the optimal allocation, (r∗1,…,r∗n), the marginal utility of resources is inversely proportional to an individual's weight: u′(r∗i)∝1/λi. Therefore individuals with a larger weight will receive more resources: if λi<λj then 1/λi>1/λj so the optimality condition implies u′(r∗i)>u′(r∗j) and the strict concavity implies r∗i<r∗j. In the special case where u is logarithmic, the resources allocated to an individual will be directly proportional to their weight. Thus, the social planner's preferences, combined with the weights of the distribution, can lead to a "tyranny of the majority" in the resulting principle of social justice.
Finally, I will turn to Rawls' Original Position, the most famous aggregative principle of social justice.
2.3. Formalising ROI
The procedure is similar to LELO and HL: (1) The population is represented by a nonempty finite subset of individuals. (2) Each social outcome provides a function from individuals to their personal outcomes. This function can be lifted to a function from nonempty finite subsets of individuals to the corresponding nonempty finite subsets of personal outcomes, and then applied to the subset representing the population. Hence each social outcome provides a nonempty finite subset of personal outcomes. (3) Any nonempty finite subset of personal outcomes can be fused into a single personal outcome. (4) Therefore, each social outcome provides a single personal outcome.
I'll now spell out the details.
(1) ROI must assume that the population is represented by a nonempty finite subset of individuals A∈P+f(I). For any set X, let P+f(X) denote the nonempty finite subsets of X. This is a standard notation, where P stands for powerset, the superscript + stands for nonempty and the subscript f stands for finite.
Note that A carries no additional structure beyond being a set — unlike the list l∈List(I) used in LELO, it carries no ordering, and unlike the distribution π∈Δ(I) used in HL, it carries no weightings. Typically, A is assumed to be the universal set I itself, representing all individuals. However, Rawls suggests that alternative subsets could be considered, such as the set of "Heads of Families" or "presently existing people".
(2) As with LELO and HL, there exists a function γ:I×S→P such that, if the social outcome s∈S obtains, then each individual i∈I faces the personal outcome γ(i,s)∈P. It follows that each social outcome s∈S provides a function γ(−,s):I→P from individuals to their personal outcomes, where γ(−,s) denotes the function i↦γ(i,s).
Now, any function f:I→P from individuals to their personal outcomes can be lifted to a function fP+f:P+f(I)→P+f(P) from nonempty finite subsets of individuals to the corresponding nonempty finite subsets of personal outcomes. Concretely, fP+f sends a subset i1,…,in to the subset {f(i1),…,f(in)}, by applying f elementwise. This lifting operation is a general feature of nonempty finite subsets.
Hence, each social outcome s provides a distribution of personal outcomes γ(−,s)P+f(A), obtained by lifting γ(−,s):I→P to a function γ(−,s)P+f:P+f(I)→P+f(P) and then applying to the distinguished subset of individuals A∈P+f(I).
(3) ROI assumes that any nonempty finite subset of personal outcomes can be fused into a single personal outcome. Formally, there exists a function ⨁:P+f(P)→P which reduces any nonempty finite subset of personal outcomes p1,…,pn∈P+f(P) into a single personal outcome ⨁({p1,…,pn})∈P.
To obtain Rawls' principle of social justice, we should interpret ⨁({p1,…,pn}) as the personal outcome where one might face any of the outcomes p1,…,pn, but without any information about which outcome is more likely. That is, the fusion operator acts like a disjunction between the personal outcomes — for example, if p1 is the outcome of eating vanilla ice cream and p2 is the outcome of eating chocolate ice cream, then p1⊕p2 is the outcome of eating either vanilla or chocolate ice cream, with no probabilities attached. One could imagine that the exact prospect is selected by a third-party, maybe an adversary who selects the worse option or a benefactor who selects the best option.
It is worth noting that the fusion operator ⨁:P+f(P)→P can equivalently be presented by a binary operator ⊕:P×P→P, provided ⊕ satisfies the axioms of a semilattice.[10] Specifically, given ⨁, define ⊕ as p⊕p′:=⨁({p,p′}). Conversely, given ⊕, define ⨁ as ⨁({p1,…,pn})=p1⊕⋯⊕pn.
The semilattice axioms are:
(4) Putting it all together, each social outcome s provides a single fused personal outcome, namely ζROI(s):=⨁(γ(−,s)P+f(A))∈P, obtained by applying the fusion operator ⨁:P+f(P)→P to the nonempty finite subset of personal outcomes γ(−,s)P+f(A)∈P+f(P) generated by s. This defines the ROI aggregation function ζROI:S→P, which assigns to each social outcome the fused personal outcome.
To illustrate, suppose the population I=i1,…,in is represented by the universal subset A={i1,…,in}. Suppose further that the social outcome s assigns personal outcome pk to individual ik for each k, i.e. γ(ik,s)=pk. Then the fused personal outcome is ζROI(s)=p1⊕⋯⊕pn. As a sanity check, consider the trivial case where I=i consists of a single individual. Here the population subset is the singleton A={i}∈P+f(I), and the fused outcome is simply ζROI(s)=γ(i,s), i.e. the personal outcome assigned to the sole individual i.
The choice of the distinguished subset A affects the structure of the concatenated outcome. For example, suppose the social planner is pessimistic, evaluating the fused outcome p1⊕p2 as no better than the worst of the individual outcomes p1 and p2. Formally, if the planner's preferences are represented by a utility function u:P→R, this means assuming u(p1⊕p2)=min{u(p1),u(p2)}. This is a realistic assumption of decision-making under Knightian uncertainty, where the planner considers the worst-case scenario.[11] Under this assumption, the resulting ROI principle will be sensitive to the worst-off individuals in the population subset A, leading to a 'tyranny of the unfortunate'.
Unlike HL, ROI is scope-insensitive due to the idempotence of the fusion operation ⊕, meaning p⊕p=p. This implies that the fused outcome is insensitive to the number of individuals facing each personal outcome, and depends only on which personal outcomes are faced at all. For a stark illustration, suppose that a social outcome contains 100 individuals facing great wealth (p) and 1 facing abject poverty (p′). ROI yields the same fused outcome p⊕p′ as a social outcome with 1 individual facing wealth and 100 facing poverty. The drastically different proportions of individuals are irrelevant; only the presence or absence of each outcome matters.
2.4. Analysis
LELO, HL, and ROI share a common structure, differing only in the specific mathematical objects used. They represent populations using some type of collection: lists for LELO, distributions for HL, and subsets for ROI. And they aggregate personal outcomes using some mode of aggregation: concatenation for LELO, interpolation for HL, and fusion for ROI. This suggest that LELO, HL, and ROI are instances of a general family of aggregative principles, obtained by varying the type collection and mode of aggregation. In the next section, I will show that this is true.
3. Monads and aggregative principles
The key difference between LELO, HL, and ROI lies in their mode of aggregation. In Section 3, we will formalise this informal notion of a "mode of aggregation", and thereby find the general family of aggregative principles.
3.1. Monads formalise collections
The concept of a monad originates in category theory, and has found extensive applications in functional programming languages like Haskell. While category theory lies beyond the scope of this article, monads can be understood concretely as formalising the notion of a "collection". The core idea is that monads allow for operations on the elements to be lifted to operations of the collections, in a way that preserves certain intuitive properties.
Formally, a monad M consists of four components:
These components must satisfy certain coherence conditions, known as the monad laws:
For the full technical details, see Mac Lane (1971) "Categories for the Working Mathematician".
The three types of collections we've encountered so far — lists, distributions, and nonempty subsets — are formalised by monads.
For example, the list monad List has these four components:[12]
And the distribution monad Δ has these four components:[13]
And finally, the nonempty powerset monad P+f has these four components:
Whenever you encounter an informal concept of a collection, it will typically be formalizable as a monad. Let's take the finite multiset: intuitively, a multiset is a collection that allows multiple instances of each element, but where the order doesn't matter. Formally, for any set X, a finite multiset on X is a function π:X→N, where π(x) represents the number of occurrences of element x. The multiset is finite if there are finitely many x∈X with π(x)>0. The set of all finite multisets on X is denoted N[X]. Now, elements of N[−] are intuitively collections over X. Sure enough, the assignment X↦N[X] is a monad, which we call the finite multiset monad N[−]. The definitions of fN[−], η, and μ are similar to those for Δ.[14]
3.2. Algebras formalise aggregations.
Algebraic structures are ubiquitous in mathematics: monoids, groups, rings, vector spaces, lattices, and so on. Informally, an algebraic structure is a set equipped with some operations (like addition, multiplication, etc.) that satisfy certain axioms (like associativity, commutativity, etc.). A core insight from category theory is that each type of algebraic structure corresponds to a monad.
As discussed earlier, each monad M captures a general notion of a "collection" of elements. An algebra of M is a way to aggregate any collection of those elements into a single element. Formally, given a monad (M,η,μ), an M-algebra is a set X equipped with a function α:M(X)→X satisfying two laws:
Intuitively, the unit law says that aggregating a singleton collection ηX(x)∈M(X) should just return the element x∈X itself. The associativity law says that aggregating a collection of collections m∈M(M(X)) can be done in two equivalent ways: first flatten the nested collections using μX and then aggregate the resulting collection using α; or first aggregate each inner collection using α (this is what αM does), and then aggregate the resulting outer collection using α again.
Each algebraic structure corresponds to a monad. For example, consider the most important algebraic structure: the vector space. The relevant monad M assigns to each set X the set V(X) of functions v:X→R with v(x)≠0 for only finitely many x∈X. For example, if X is the set {milk,eggs,sugar} then a typical element of V(X) might look like 2⋅milk+1⋅eggs−3⋅sugar. An algebra for the monad V(X) is precisely a vector space: a set X equipped with a function α:V(X)→X satisfying the appropriate unit and associativity laws. This definition captures the essence of a vector space — the ability to aggregate arbitrary linear combinations — with a single operation α:V(X)→X.
3.3. A general aggregative principle
As promised, we can now formulate a general family of aggregative principles using the language of monads and algebras. Each principle has the following form: a social planner should make decisions as if they will face the aggregate of the personal outcomes across all individuals in the population.
Informally, this whole procedure can be summarized as follows: (1) The population is represented by a distinguished collection of individuals. (2) Each social outcome provides a function from individuals to their personal outcomes. This function can be lifted to a function from collections of individuals to the corresponding collections of personal outcomes, and then applied to the collection representing the population. Hence each social outcome provides a collection of personal outcomes. (3) Any collection of personal outcomes can be aggregated into a single personal outcome. (4) Therefore, each social outcome provides a single personal outcome.
(1) Let M be any monad, assigning to every set X another set M(X) of collections over X. We must assume that the population is represented by a distinguished collection i∈M(I). Typically, i is chosen to represent the entire population impartially, although non-impartial collections could also be considered.
(2) As discussed before, there exists a function γ:I×S→P such that, if the social outcome s∈S obtains, then each individual i∈I faces the personal outcome γ(i,s)∈P. It follows that each social outcome s∈S provides a function γ(−,s):I→P from individuals to their personal outcomes, where γ(−,s) denotes the function i↦γ(i,s).
Now, any function f:I→P from individuals to their personal outcomes can be lifted to a function fM:M(I)→M(P) from collections of individuals to the corresponding collections of personal outcomes. This lifting operation is a general feature of monads.
Hence, each social outcome s provides a collection of personal outcomes γ(−,s)M(i), obtained by lifting γ(−,s):I→P to a function γ(−,s)M:M(I)→M(P) and then applying to the distinguished collection of individuals i∈M(I).
(3) We assume that any collection of personal outcomes can be aggregated into a single personal outcome. Formally, there exists a function α:M(P)→P which reduces any collection of personal outcomes p∈M(P) into a single personal outcome α(p)∈P.
A key requirement for obtaining a normatively compelling principle of social justice is that the aggregation function α:M(P)→P is "monotonic". That is, aggregating more desirable personal outcomes should yield a more desirable result than aggregating less desirable personal outcomes, as judged by the the self-interested social planner. This feature incentivizes the social planner to choose policies that benefit individuals in society, and to avoid policies that harm individuals, all else being equal.
(4) Putting it all together, each social outcome s provides a single aggregated personal outcome, namely ζM,α,i(s):=α(γ(−,s)M(i))∈P, obtained by applying the aggregation operator α:M(P)→P to the collection of personal outcomes γ(−,s)M(i)∈M(P) generated by s. This defines the general aggregation function ζM,α,i:S→P, which assigns to each social outcome the aggregated personal outcome.
As a sanity check, consider the trivial case where I={i} consists of a single individual. Here the population collection is the singleton i=ηI(i)∈M(I), and the aggregated outcome is simply ζM,α,i(s)=γ(i,s), i.e. the personal outcome assigned to the sole individual i.
By varying the monad M, the distinguished collection i, and the aggregation function α, one can capture a wide range of principles, including LELO, HL, and ROI as special cases.
4. Algebraic structures on personal outcomes
As we can see, the algebraic structures that exist on the personal outcomes constrain which aggregative principles are well-defined. In particular, the monad M and aggregation function α must be compatible, in the sense that α defines an M-algebra on the set P of personal outcomes. In this section, we will explore some concrete examples of algebraic structures on personal outcomes — including monoids, convex spaces, and semilattices, which are required for LELO, HL, and ROI respectively. Some of these examples will be exotic, thereby generating novel aggregative principles of social justice.
This section is not intended to be exhaustive. Indeed, there are countless possible algebraic structures one could consider, and the choice of algebraic structure will depend on the phenomena under investigation.
4.1. Personal outcomes as monoid
How might we model personal outcomes such that they form a monoid, as required by LELO? Recall that LELO requires a concatenation operator conc:List(P)→P. Equivalently, we seek a binary operator ▹:P×P→P and a constant element ϵ∈P satisfying the axioms of a monoid, as discussed in section 3.1.
Example 1
The simplest way to model personal outcomes as a monoid is for each personal outcome p to be list over a fixed alphabet A, i.e. P:=Δ(A). We can think of elements of A as the discrete moments which constitute a human life. For example, A might be the set of minute-long experiences — then a human life of 80 years would be modelled as a list of 42 million elements from A.
Indeed P:=List(A) has a monoid structure. In fact, this is the free monoid over A, meaning it is the 'most general' or 'least constrained' monoid containing A. The monoid operation ▹ is given by concatenation of lists: if p=[a1,…,an] and p′=[a′1,…,a′n′] are two lists, then p▹p′=[a1,…,an,a′1,…,a′n′]. The identity element ϵ is the empty list []. This is the simplest type of monoid, and thus the natural starting point for modeling personal outcomes in the context of LELO.
Example 2
Alternatively, we can model personal outcomes in a more continuous way. Suppose each personal outcome p is a pair (d,f) where:
We can define a monoid operation ▹ on P by concatenating durations and 'switching' between trajectories. Formally, for p=(d,f) and p′=(d′,f′), we define p▹p′ to be the pair (d+d′,~f) where ~f:(0,d+d′]→A is given by ~f(t)={f(t)if t≤df′(t−d)else. The identity element ϵ∈P is the pair (0,!A) where !A:(0,0]→A is the empty function to A.[15]
We could also restrict the trajectories f:(0,d]→A to be piecewise smooth, piecewise continuous, piecewise constant, or to satisfy any other reasonable piecewise condition. P remains a monoid under these restrictions, because the concatenation of piecewise smooth (resp. continuous, constant) functions is again piecewise smooth (resp. continuous, constant).
Example 3
In the previous two examples, we've modelled personal outcomes as predetermined trajectories through some space of experiences, either discrete or continuous. However, these models assume that an individual's life trajectory is fixed in advance, which is often unrealistic. In reality, individuals make choices that shape the course of their lives over time. To capture this agency, we can model personal outcomes as environments that are actively guided by the individual's actions.
Suppose we model a personal outcome as an interactive environment consisting of:
We can define a monoid operation on the set P by running the two environments p and p′ in parallel, where the individual simultaneously chooses actions and receives observations in both environments. Concretely, given p=(A,O,τ) and p′=(A′,O′,τ′), we define their product p▹p′ to be the environment (A×A′,O×O′,τ⊗τ′) where:
The identity element ϵ∈P is the trivial environment with a single action and a single observation, i.e. A=O={⋆} and τ(⋆) is the point distribution on ⋆.
Example 4
We can further extend the previous example by incorporating rewards. Suppose a personal outcome is modelled by:
As before, we can define a monoid operation on the set P by running the two 'reward-augmented' environments p and p′ in parallel, where the individual simultaneously chooses actions and receives observations in both environments, except that now each environments also produces a reward. The rewards are summed and received by the individual. The identity element ϵ∈P is the trivial environment with a single action and a single observation, i.e. A=O={⋆} and τ(⋆) is the point distribution on (⋆,0)∈O×R.
4.2. Personal outcomes as convex space
We've seen how personal outcomes form a monoid, as required by LELO. Next let's turn to convex spaces, as required by HL. Recall that HL requires an interpolation operator E:Δ(P)→P. Equivalently, we seek a family of binary operators {+λ:P×P→P}λ∈(0,1) satisfying the axioms of a convex space, as discussed in section 3.2.
Example 5
The simplest way to model personal outcomes as a convex space is to take each outcome p to be a probability distribution over some fixed set of alternatives A, i.e. P:=Δ(A). For example, A might be the set of possible life histories, where a life history specifies all the relevant details of a person's life from birth to death, such as their physical and mental states, relationships, major life events, achievements, etc. A personal outcome is then a probability distribution over these possible life histories.
Indeed P:=Δ(A) has a convex structure. In fact, this is the free convex space over A, i.e. the 'least constrained' convex space containing A. The interpolation operators are given by the standard notion of interpolation of distributions. That is, if p:A→[0,1] and p′:A→[0,1] are two distributions, then their λ-interpolation p+λp′:A→[0,1] is the distribution defined by (p+λp′)(a)=λ⋅p(a)+(1−λ)⋅p′(a). This is the simplest type of convex space, and thus the natural starting point for modeling personal outcomes in the context of HL.
Example 6
Again, this is a model of personal outcomes which lacks any notion of individual agency. Personal outcomes are simply probability distributions over a fixed set of alternatives, with no room for individuals to make choices that affect their outcomes. To incorporate individual agency, we will again model a personal outcome as an interactive environment consisting of an action set A, an observation set O, and a function τ:A→Δ(O) assigning to each action a probability distribution over observations.
To define an interpolation operation on personal outcomes in this setting, we use the idea of stochastic case handling. Given two personal outcomes p=(A,O,τ) and p′=(A′,O′,τ′), define their λ-interpolation p+λp′ as follows:
Example 7
We could imagine interpolating between personal outcomes in a more direct way. For example, if p is the personal outcome of winning £100, and p′ is the personal outcome of winning £1, then p+λp′ is the personal outcome of winning £(1+λ⋅99). However, it's unclear how to extend this interpolation to personal outcomes lacking an inherently probabilistic or quantitative structure. For instance, suppose p is the outcome of being happily married with two children and an unfulfilling career, while p′ is the outcome of being single and childless but having a fulfilling career. It's unclear how to meaningfully define an outcome "50% between them".
One approach is to represent personal outcomes as vectors in a high-dimensional real vector space such as Rd. Here d is some large number, potentially hundreds or thousands. The benefit of a vector representation is that the space of personal outcomes P inherits the natural convex structure of Rd. Concretely, for any two outcome vectors p,p′∈Rd and any weight λ∈(0,1), we define the λ-interpolation p+λp′ as the weighted average λp+(1−λ)p′.
Intuitively, if the dimensions of Rd correspond to relevant features of the outcome, then the interpolated outcome p+λp′ has intermediate feature values between those of p and p′. The relative influence of p and p′ is controlled by the weight λ. For the vector representation to be useful, it must encode all the important information about the outcome in a structured format (e.g. ensuring that similar outcomes map to similar vectors). This is a nontrivial challenge. Many important features, such as happiness, fulfilment, and relationships, are difficult to measure numerically.
One trick to obtain vector representations of personal outcomes could be to leverage the semantic knowledge embedded in a large pretrained language model like GPT-3. In particular, the activation space of a pretrained model can represent general semantic concepts, including personal outcomes, and comes equipped with a convex structure. Using this convex structure, we obtain the following aggregative principle: a social planner should make decisions as if they will face the average personal outcome across all individuals, where the averaging is performed in the activation space of the language model.[17]
Whether this aggregative principle is appropriate will depend on how personal outcomes are represented within the activations of GPT-3. In particular, we desire the monotonicity property. That is, if the interpolation λ1⋅p1+⋯+λn⋅pn is less desirable than the interpolation λ1⋅p′1+⋯+λn⋅p′n then there exists some pi less desirable than p′i. Monotonicity would ensure that a social planner following this aggregative principle will, all else being equal, tend to choose policies that benefit individuals and avoid policies that harm individuals
The dimensionality d of the latent space controls the level of detail captured about personal outcomes. The extreme cases are problematic:
4.3. Personal outcomes as semilattice
We've seen how personal outcomes form a monoid or convex space, as required by LELO and HL respectively. Next let's turn to convex spaces, as required by ROI. Recall that ROI requires a fusion operator ⨁:P+f(P)→P. Equivalently, we seek a of binary operator ⊕ satisfying the axioms of a semilattice, as discussed in section 3.3.
Example 8
The simplest way to model personal outcomes as a semilattice is to take each outcome p to be a nonempty finite subset of a fixed set of alternatives A, i.e. P:=P+f(A). A might be the set of possible life histories, specifying all the relevant details of a person's life from birth to death. A personal outcome p={a1,…,an}⊆A is a state where any of the alternatives a1,…,an are possible, without specifying their likelihoods or the mechanism that will select among them.
Indeed P:=P+f(A) has a semilattice structure. In fact, this is the free semilattice over A, i.e. the 'least constrained' semilattice containing A. The fusion operators are given by the standard union between sets. That is, if p⊆A and p′⊆A are two subsets, then their fusion p⊕p′⊆A is the subset defined by p⊕p′=p∪p′. This recovers the disjunctive reading of the fusion operator. For example, if p={vanilla,chocolate} represents the outcome of having either vanilla or chocolate ice-cream, and p′={chocolate,strawberry} represents the outcome of having either chocolate or strawberry ice-cream, then their fusion p⊕p′={vanilla,chocolate,strawberry} represents the outcome of having either vanilla, chocolate or strawberry ice-cream. This is the simplest type of semilattice, and thus the natural starting point for modeling personal outcomes in the context of ROI.
Example 9
Alternatively, we could interpret fusion as conjunction rather than disjunction: if p is the outcome of playing tennis and p′ is the outcome of listening to Bach, then p⊕p′ is the outcome of simultaneously playing tennis and listening to Bach. In the conjunctive interpretation, we take the elements of A to be specifications or properties about personal outcomes. A personal outcome p is represented by a subset of A, where p contains exactly those specifications that the outcome satisfies. Fusion is still defined as set union, i.e. p⊕p′=p∪p′. The fused outcome p⊕p′ will satisfy a specification if and only if at least one of p or p′ satisfies it.
For the fusion operation to always yield a coherent personal outcome, we require any finite subset of specifications in A to be mutually consistent. This is a very strong assumption that rules out the vast majority of possible sets of specifications. For example, "has a PhD" and "has no higher education" cannot both be specifications in A. Moreover, even if we could represent personal outcomes with a space A of mutually consistent specifications, the resulting aggregative principle of social justice would likely fail to match our moral judgments. The problem is that the hypothetical prospect of "living every life simultaneously" is so alien that the social planner's preferences about it are unlikely to track anything normatively relevant.
Example 10
As discussed previously, we can represent personal outcomes as vectors in a high-dimensional real vector space such as Rd. The benefit of a vector representation is that the space of personal outcomes P inherits a natural semilattice structure of Rd. Concretely, for any two outcome vectors p,p′∈Rd we can define their fusion p⊕p′ by the taking the coordinatewise maximum: (p⊕p′)i=max{pi,p′i} for i=1,…,d.
Intuitively, if the dimensions of Rd correspond to degrees or intensities of different attributes, then the fused outcome p⊕p′ has each attribute at the higher of the two degrees from p and p′. For example, consider feature vectors with dimensions for wealth, sickness, and number of children. Fusing two such vectors would yield an outcome with the wealth of the wealthier individual, the sickness of the sicker individual, and the greater number of children. This example illustrates that the choice of vector representation substantively changes the resulting aggregative principle of social justice.
As discussed in the previous section, one approach to obtaining semantically meaningful vector representations of personal outcomes is to leverage the internal activations of a large language model like GPT-3. However, unlike the convex combination approach discussed earlier, defining the fusion operator p⊕p′ via the coordinatewise maximum (p⊕p′)i=max{pi,p′i} has a limitation when applied to language model embeddings. Namely, this fusion operator is not rotation-invariant, meaning the aggregative principle would depend on the basis in the model's activation space. To amend this issue, we might learn change-of-basis transformations from the model's activation space to a new embedding space where coordinatewise maximum yields an appropriate principle of social justice.
Conclusion
In this article, we examined aggregative principles of social justice, i.e. principles stating that a social planner should make decisions as if they will face the aggregated personal outcomes of every individual in the population. We saw three well-known examples — Live Every Life Once (LELO), Harsanyi's Lottery (HL), and Rawls' Original Position (ROI). After introducing the mathematical concept of a monad, we constructed a general family of aggregative principles.
Finally, we explored several concrete examples of algebraic structures on personal outcomes, with natural interpretations as monoids, convex spaces, and semilattices. The generality of the framework allowed for the development of novel principles, beyond those already discussed in the literature. For instance, we considered modeling personal outcomes as:
In conclusion, aggregative principles offer a fruitful strategy for specifying principles of social justice. In my next article, I prove that, under natural conditions of human rationality, aggregative principles will approximate utilitarian principles. Therefore, even though aggregativism avoids the theoretical pitfalls of utilitarianism, we should nonetheless expect aggregativism to generate roughly-utilitarian recommendations in practical social contexts, and thereby retain the most appealing insights from utilitarianism.
See Appraising aggregativism and utilitarianism for a thorough defence.
The term LELO originates in Loren Fryxell (2024), "XU", which is where I first encountered the concept. I think Fryxell offers the first formal treatment of the LELO principle.
MacAskill (2022), "What We Owe the Future", says this thought experiment comes from Georgia Ray (2018), “The Funnel of Human Experience”, and that the short story Andy Weir (2009), "The Egg", shares a similar premise.
But (as Elliott Thornley notes), Roger Crisp attributes LELO to C.I. Lewis. This would predate both Ray and Weir, but I haven't traced the reference.
John C. Harsanyi "Cardinal Utility in Welfare Economics and in the Theory of Risk-Taking" (1953) and "Cardinal Welfare, Individualistic Ethics, and Interpersonal Comparisons of Utility" (1955)
John Rawls (1971), "A Theory of Justice"
See https://plato.stanford.edu/entries/original-position/
John Harsanyi (1975) "Can the Maximin Principle Serve as a Basis for Morality? A Critique of John Rawls's Theory"
See https://ncatlab.org/nlab/show/list
See https://ncatlab.org/nlab/show/monoid
See https://ncatlab.org/nlab/show/convex+space
See https://ncatlab.org/nlab/show/semilattice
This is called the min-max principle in decision theory, and Murphy's law colloquially.
See https://ncatlab.org/nlab/show/list+monad
See https://ncatlab.org/nlab/show/distribution+monad
See https://ncatlab.org/nlab/show/free+commutative+monoid
Note that the half-open interval (0,0] is the empty set ∅, because there are no real numbers 0<t≤0, and for any set A there is exactly one function !A:∅→A which we call the empty function.
If τ:A→Δ(O) and τ′:A′→Δ(O′) are transition functions, considered as functions τ:A×O→[0,1] and τ′:A′×O′→[0,1], then ~τ:A×A′→Δ(O+O′) is defined by ~τ(a,a′,o)={λ⋅τ(a,o)if o∈O(1−λ)⋅τ′(a′,o)if o∈O′
Concretely, to assess a social outcome s, the social planner should follow the following steps:
(1) Describe the personal outcome of each individual i∈I, e.g. "Alice lives a happy life as a successful doctor with a loving family."
(2) Run a forward pass of the language model on each prompt, without generating any new tokens, and extract the model internal activations. The choice of which specific activation to extract would be a hyperparameter to tune, but one natural choice is a hidden state of the model's residual stream. Overall, this gives some function Nθ:T→Rd where T is the space of prompts and θ is the trained parameters of the model.
(3) For each individual i, obtain a vector representation vi∈Rd of their personal outcome by applying the function Nθ to their prompt. Compute the social outcome vector v∗ as a weighted average of the individual outcome vectors: v∗=∑iλivi.
(4) Interpret the social outcome vector v∗ by finding a natural language prompt t∗ such that Nθ(t∗) is close to v∗. This is a nontrivial inverse problem and may require heuristics. One approach is to perform gradient descent over the space of prompts T to minimize a loss function L(t;v∗,θ)=||Nθ(t)−v∗||ρ−βlogP(t;θ). Here ||⋅||ρ is the lρ-norm, P(t;θ) is the probability of p under the language model, and β is a hyperparameter controlling the relative importance of the two terms. Intuitively, this finds a prompt that has a vector representation close to v∗ and is likely under the language model.
When assessing the social outcome s, the social planner should make decisions as if they will face the outcome described in t∗, obtained in the procedure above.