Toy model piece #2: Combining short and long range partial preferences

Stuart_Armstrong

I'm working towards a toy model that will illustrate all the steps in the research agenda. It will start with some algorithmic stand-in for the "human", and proceed to create the , following all the steps in that research agenda. So I'll be posting a series of "toy model pieces", that will then be ultimately combined in a full toy model. Along the way, I hope to get a better understanding of how to do the research agenda in practice, and maybe even modify that agenda based on insights making the toy model.

For this post, I'll look in more detail into how to combine different types of (partial) preferences.

Short-distance, long-distance, and other preferences

I normally use population ethics as my go-to-example for a tension between different types of preferences. You can get a lot of mileage by contrasting the repugnance of the repugnant conclusion with the seeming intuitiveness of the mere addition argument.

However, many people who read this will have strong opinions about population ethics, or at least some opinions. Since I'm not trying to convince anyone of my particular population ethics here, I thought it best to shift to another setting where we could see similar tensions at work, without the baggage.

Living in a world of smiles

Suppose you have three somewhat contradictory ethical intuitions. Or rather, in the formulation of my research agenda, two somewhat contradictory partial preferences.

The second is that any world would be better if people smiled more ( $P_{1}$ ). The third is that if almost everyone smiles all the time, it gets really creepy ( $P_{2}$ ).

Now, the proper way of resolving those preferences is to appeal to meta-preferences, or to cut them up into their web of connotations: why do we value smiles? Is it because people are happy? Why do we find universal smiling creepy? Is it because we fear that something unnatural is making them smile that way? That's the proper way of resolving those preferences.

However, let's pretend there are no meta-preferences, and no connotations, and just try to combine the preferences as given.

Smiles and worlds

Fix the population to a hundred people, and let $W$ be the set of worlds. This set will contain one hundred and one different worlds, described by $w (n)$ , where $0 \leq n \leq 100$ is an integer, denoting the number of people smiling in these worlds.

We can formalise the preferences as follows:

$P_{1} = {w (n) \leq_{1} w (m) ∣ n \leq m}$ .
$P_{2} = {w (n) \leq_{2} w (m) ∣ n \geq 95$ and $n \geq m$ }$.

These give rise to the following utility functions (for simplicity of the formula, I've translated the definition of $U_{2}$ ; translations don't matter when combining utilities; I've also written $U_{i} (w (n))$ as $U_{i} (n)$ ):

$U_{1} (n) = 2 n - 100$ .
$U_{2} (n) = 2 \times min (94 - n, 0)$ .

But before being combined, there preferences have to be normalised. There are multiple ways we could do this, and I'll somewhat arbitrarily choose the "mean-max" method, which normalises the utility difference between the top world and the average world^[1].

Given that normalisation, we have:

$| | U_{1} | |_{m e m a} = 100 - 0 = 100$ .
$| | U_{2} | |_{m e m a} = 0 - (- 42 / 101) = 42 / 101 \approx 0.42$ .

Thus we send the $U_{i}$ to their normalised counterparts:

$U_{1} (n) \to {ˆ U}_{1} (n) = n / 50 - 1$ .
$U_{2} (n) \to {ˆ U}_{2} (n) = \frac{101}{21} min (94 - n, 0)$ .

Now consider what happens when we do the weighted sum of these utilities, weighted by the intensity of the human feeling on the subject:

$U = w_{1} {ˆ U}_{1} + w_{2} {ˆ U}_{2}$ .

If the weights $w_{1}$ and $w_{2}$ are equal, we get the following, where the utility of the world grows slowly with the number of smiles, until it reaches the maximum at $n = 94$ and then drops precipitously:

Thus $U_{1}$ is dominant most of the time when comparing worlds, but $U_{2}$ is very strong on the few worlds it really wants to avoid.

But what if $U_{2}$ (a seeming odd choice) is weighted less that $U_{1}$ (a more "natural" choice)?

Well, setting $w_{1} = 1$ for the moment, if $w_{2} = 21 / 5050$ , then the utility for all worlds with $n \geq 94$ are the same:

Thus if $w_{2} > 21 / 5050$ , ${ˆ U}_{2}$ will force the optimal $n$ to be $n \leq 94$ (and ${ˆ U}_{1}$ will select $n = 94$ from these options). If $w_{2} < 21 / 5050$ , then ${ˆ U}_{1}$ will dominate completely, setting $n = 100$ .

This seems like it could be extended to solve population ethics considerations in various ways (where $U_{1}$ might be total utilitarianism, with $U_{2}$ average utilitarianism or just a dislike of worlds with everyone at very low utility). To go back to my old post about differential versus integral ethics, $U_{1}$ is a differential constraint, $U_{2}$ is an integral one, and $n = 94$ is the compromise point between them.

Inverting the utilities

If we invert the utilities, things behave differently. If we had $- U_{1}$ (smiles are bad) and $- U_{2}$ (only lots of smiles are good) instead, things would be different^[2]. In mean-max, the norm of these would be:

$| | - U_{1} | |_{m e m a} = 100 - 0 = 100$ .
$| | - U_{2} | |_{m e m a} = 12 - (42 / 101) = 1170 / 101 \approx 11.58$ .

So the normalised version of $- U_{1}$ is just $- {ˆ U}_{1}$ , but the normalised version of $U_{2}$ is different from $- {ˆ U}_{2}$ .

Then, at equal weights, we get the following graph for $U$ :

Thus $- U_{2}$ fails at having any influence, and $n = 0$ is optimum.

To get the break-ever point, we need $w_{2} = 585 / 303$ , where $n = 0$ and $n = 100$ are equally valued:

For $w_{2}$ greater than that, $- U_{2}$ dominates completely, and forces $n = 100$ .

It's clear that $U_{1}$ and $U_{2}$ are less "antagonistic" than $- U_{1}$ and $- U_{2}$ are (compare the single peak in the graph in the first case, with the two peaks in the second).

Why choose the mean-max normalisation? Well, it seems to have a number of nice properties. It has some nice formal properties, as the intertheoretic utility comparison post demonstrates. But it also, to some extent, boosts utility function to the extent that they do not interfere much with other functions.

What do I mean by this? Well, consider two utility functions over $n + 1$ different worlds. The first one, $V_{1}$ , ranks one world ( $W_{1}$ ) as above all others (the other ones being equal). The second one, $V_{2}$ , ranks one world ( $W_{2}$ ) as below all others (the other ones being equal).

Under the mean-max normalisation, $V_{1} (W_{1}) = 1$ and $V_{1} (W) = - 1 / n$ for other $W$ . Under the same normalisation, $V_{2} (W_{2}) = - n$ while $V_{2} (W) = 1$ for other $W$ .

Thus $V_{2}$ has a much wider "spread" that $V_{1}$ , meaning that, in a normalised sum of utilities, $V_{2}$ affects the outcome much more strongly than $V_{1}$ ("outcome" meaning the outcome of maximising the summed utility). This is acceptable, even desirable: $V_{2}$ dominating the outcome just rules out one universe ( $W_{2}$ ), while $V_{1}$ dominating the outcome rules out all-but-one universe ( $W_{1}$ ). So, in a sense, their ability to focus the outcome is comparable: $V_{1}$ almost never focuses the outcome, but when it does, it narrows down to a single universe. While $V_{2}$ almost always focuses the outcome, but barely narrows it down. ↩︎
There is no point having the pairs being $(U_{1}, - U_{2})$ or $(- U_{1}, U_{2})$ , since those pairs agree on the ordering of the worlds, up to ties. ↩︎

LESSWRONG
LW