This is a result from the first MIRIx Cambridge workshop (coauthored with Janos and Jim).

One potential problem with bounded utility functions is: what happens when the bound is nearly reached? A bounded utility maximizer will get progressively more and more risk averse as it gets closer to its bound. We decided to investigate what risks it might fear. We used a toy model with a bounded-utility chocolate maximizer, and considered what happens to its resource allocation in the limit as resources go to infinity.

We use "chocolate maximizer'' as conceptual shorthand meaning an agent that we model as though it has a single simple value with a positive long-run marginal resource cost, but only as a simplifying assumption. This is as opposed to a paperclip maximizer, where the inappropriate simplicity is implied to be part of the world, not just part of the model.

Conceptual uncertainty

We found that if a bounded utility function approaches its bound too fast, this has surprising pathological results when mixed with logical uncertainty. Consider a bounded-utility chocolate maximizer, with philosophical uncertainty about what chocolate is. It has a central concept of chocolate , and there are classes of mutated versions of the concept of chocolate at varying distances from the central concept, such that the probability that the true chocolate is in class is proportional to (i.e. following a power law).

Suppose also that utility is bounded using a sigmoid function , where x is the amount of chocolate produced. In the limit as resources go to infinity, what fraction of those resources will be spent on the central class ? That depends which sigmoid function is used, and in particular, how quickly it approaches the utility bound.

Example 1: exponential sigmoid

Suppose we allocate resources to class , with for total resource r. Let .

Then the optimal resource allocation is

Using Lagrange multipliers, we obtain for all i,

 

Then,

Thus, the resources will be evenly distributed among all the classes as r increases. This is bad, because the resource fraction for the central class goes to 0 as we increase the number of classes.

EDITED: Addendum on asymptotics

Since we have both r and n going to infinity, we can specify their relationship more precisely. We assume that n is the highest number of classes that are assigned nonnegative resources for a given value of r:

Thus,

so the highest class index that gets nonnegative resources satisfies 

Example 2: arctan sigmoid

Let .

The optimal resource allocation is

Using Lagrange multipliers, we obtain for all i,

Then,

Thus, for the limit of the resource fraction for the central class is finite and positive.

Conclusion

The arctan sigmoid results in a better limiting resource allocation than the exponential sigmoid, because it has heavier tails (for sufficiently large ). Thus, it matters which bounding sigmoid function you choose.

New Comment
28 comments, sorted by Click to highlight new comments since: Today at 1:22 PM

Thus, the resources will be evenly distributed among all the classes as r increases. This is bad, because the resource fraction for the central class C_1 goes to 0 as we increase the number of classes.

This sounds like perfectly sensible behavior to me. You're taking the limit as the amount of available resources goes to infinity, so the fact that the fraction of the resources that gets devoted to C_1 goes to 0 does not mean that there is a shortage of resources being devoted to C_1. In fact, as the available resources goes to infinity, the amount of resources devoted to C_1 also goes to infinity. So what's the problem?

AFAICT the problem is that if the space of chocolate concepts is significantly larger than the amount of resources available then the amount of resources spent on any one concept (inc the true concept) will be infinitesimal.

But that's a situation in which we have a vast number of things that might somewhat-plausibly turn out to be chocolate and severely limited resources. It's not obvious that we can do better.

"But we do OK if we use one sigmoid utility function and not if we use another!"

No, we do different things depending on our utility function. That isn't a problem; it's what utility functions are for. And what's "OK" depends on what the probabilities are, what your resources are, and how much you value different amounts of chocolate. Which, again, is not a problem but exactly how things should be.

But that's a situation in which we have a vast number of things that might somewhat-plausibly turn out to be chocolate and severely limited resources. It's not obvious that we can do better.

It could be a situation where we have very large resources and an exponentially large concept space.

No, we do different things depending on our utility function.

I think the justification for bounded utility functions might not be because they are the true utility functions, but because they avoid certain problems with utility functions that can tend to infinity. I could infinitely value infinite chocolate, and then the AI values all courses of action which have a nonzero probability of infinite chocolate (which is all of them) with infinite expected utility, and cannot make a choice. In this case the question is not to get the AI to follow our utility function, but to follow one similar enough to lead to a positive outcome, while avoiding problems to do with infinity.

Perhaps the problem here is that one is bounding utility but is allowing arbitrary large concept space. If concept space is bounded by setting all probabilities < epsilon equal to zero for the purposes of utility functions, then this problem would not arise, although I suspect that this approach may cause problems of its own.

It could be a situation where we have very large resources and an exponentially large concept space.

If we have enough uncertainty about what bit of concept space we're looking for to make a power-law distribution appropriate, then "very large" can still be "severely limited" (and indeed must be to make the amount of resources going to each kind of maybe-chocolate be small).

true utility functions [...] problems with utility functions that can tend to infinity.

Yes. But I wouldn't characterize this as giving the AI an approximation to our utility function that avoids problems to do with infinity -- because I don't think we have a utility function in a strong enough sense for this to be distinguishable from giving the AI our utility function. We have a vague hazy idea of utility that we can (unreliably, with great effort) by a little bit quantitative about in "small" easy cases; we don't truly either feel or behave according to any utility function; but we want to give the AI a utility function that will make it do things we approve of, even though its decisions may be influenced by looking at things far beyond our cognitive capacity.

It's not clear to me that that's a sensible project at all, but it certainly isn't anything so simple as taking something that Really Is our utility function but misbehaves "at infinity" and patching it to tame the misbehaviour :-).

I don't think we have a utility function in a strong enough sense

All the underlying axioms of expected utility theory (EUT) seem self-evident to me. The fact that most people don't shut up and multiply is something I would regard as more of their problem then a problem with EUT. Having said that, even if mapping emotions onto utility values makes sense from some abstract theoretical point of view, its a lot harder in practice due to reasons such as the complex fragility of human values which have been thoroughly discussed already.

Of course, the degree to which the average LWer approximates EUT in their feelings and behaviour is probably far greater than that of the average person. At non-LW philosophy meetups I have been told I am 'disturbingly analytical' for advocating EUT.

It's not clear to me that that's a sensible project at all, but it certainly isn't anything so simple as taking something that Really Is our utility function but misbehaves "at infinity" and patching it to tame the misbehaviour :-).

Well, I suppose there is the option of 'empathic AI'. Reverse engineering the brain and dialling compassion up to 11 is in many ways easier and more brute-force-able than creating de novo AI and it avoids all these defining utility function problems, the Basilisk, and Lob's theory. The downsides of course include a far greater unpredictability, the AI would definitely be sentient and some would argue the possibility of catastrophic failure during self-modification.

The fact that most people don't shut up and multiply is something I would regard as more of their problem than a problem with EUT.

I didn't say that we shouldn't have a utility function, I said we don't. Our actual preferences are incompletely defined, inconsistent, and generally a mess. I suspect this is true even for most LWers, and I'm pretty much certain it's true for almost all people, and (in so far as it's meaningful) for the human race as a whole.

Certainly given a utility function and a model, the best thing to do is what it is. The point was to show that some utility functions (eg using the exponential-decay sigmoid) have counterintuitive properties that don't match what we'd actually want.

Every response to this post that takes the utility function for granted and remarks that the optimum is the optimum is missing the point: we don't know what kind of utility function is reasonable, and we're showing evidence that some of them give optima that aren't what we'd actually want if we were turning the world into chocolate/hedonium.

If it seems strange to you to consider representing what you want by a bounded utility function, a post about that will be forthcoming.

No, it doesn't seem strange to me to consider representing what I want by a bounded utility function. It seems strange to consider representing what I want by a utility function that converges exponentially fast towards its bound.

I'll repeat something I said in another comment:

You might say it's a suboptimal outcome even though it's a good one, but to make that claim it seems to me you have to do an actual expected-utility calculation. And we know what that expected-utility calculation says: it says that the resource allocation you're objecting to is, in fact, the optimal one.

Or you might say it's a suboptimal outcome because you just know that this allocation is bad, or something. Which amounts to saying that actually you know what the utility function should be and it isn't the one the analysis assumes.

I have some sympathy with that last option. A utility function that not only is bounded but converges exponentially fast towards its bound feels pretty counterintuitive. It's not a big surprise, surely, if such a counterintuitive choice of utility function yields wrong-looking resource allocations?

(Remark 1: the above is a comment that remarks that the optimum is the optimum but is visibly not missing the point by failing to appreciate that we might be constructing a utility function and trying to make it do good-looking things, rather than approximating a utility function we already have.)

(Remark 2: I think I can imagine situations in which we might consider making the relationship between chocolate and utility converge very fast -- in fact, taking "chocolate" literally rather than metaphorically might yield such a situation. But in those situations, I also think the results you get from your exponentially-converging utility function aren't obviously unreasonable.)

Cool. Regarding bounded utility functions, I didn't mean you personally, I meant the generic you; as you can see elsewhere in the thread, some people do find it rather strange to think of modelling what you actually want as a bounded utility function.

This is where I thought you were missing the point:

Or you might say it's a suboptimal outcome because you just know that this allocation is bad, or something. Which amounts to saying that actually you know what the utility function should be and it isn't the one the analysis assumes.

Sometimes we (seem to) have stronger intuitions about allocations than about the utility function itself, and parlaying that to identify what the utility function should be is what this post is about. This may seem like a non-step to you; in that case you've already got it. Cheers! I admit it's not a difficult point. Or if you always have stronger intuitions about the utility function than about resource allocation, then maybe this is useless to you.

I agree with you that there are some situations where the sublinear allocation (and exponentially-converging utility function) seems wrong and some where it seems fine; perhaps the post should initially have said "person-enjoying-chocolate-tronium" rather than chocolate.

The point was to show that some utility functions (eg using the exponential-decay sigmoid) have counterintuitive properties that don't match what we'd actually want.

You still haven't answered my question of why we don't want those properties. To me, they don't seem counter-intuitive at all.

For a sufficiently powerful superintelligence with infinite resources, all things are equally chocolate. Very zen. (As long as you model diminishing returns on investment as an exponential sigmoid.)

Your funky results may also have something to do with the use of the power law in formalizing conceptual uncertainty, since that kind of distribution tends to favor a strategy where you pay attention to exceptional cases. If nothing else, you're giving C_1 a fairly low prior chance of being correct- and a surprisingly serious consideration to the idea that 'real chocolate' is nothing like we think it is, which is an epistemologically confusing position to take. Is there perhaps a theoretical reason why you need to keep things scale-invariant?

Another problem just struck me. Looking at this:

\\%20%20\end{align*}%0A)

For n>>r, ri is negative, which is absurd. Are you sure you haven't made a mistake in the algebra?

Thanks for spotting the algebra mistake - there was indeed a minus sign error (fixed).

r_i is in fact supposed to be non-negative.

The analysis didn't put any non-negativity constraints on the r_i. Without such constraints, there's nothing absurd about making some of them negative in order to make others more positive -- but indeed there is something absurd about making them all negative. Not least the fact that then they can't add up to the right thing.

My own (unreliable!) algebra gives opposite sign to the log(i) term. Its sign has to be opposite to the average-of-logs term so that they can cancel out when you add up the r's.

[EDITED to add: this doesn't change the conclusion that if you fix n and make r very large then you get approximately equal resources devoted to exploring each class. I have no idea why Vika regards that as a problem, though.]

[EDITED again to fix a sign error. Muphry's Law strikes again.]

The absence of non-negativity constraints on r_i is absurd - you can't allocate negative resources to things!

My own (unreliable!) algebra gives opposite sign to the average-of-log-i term.

Meaning that regardless of what values r and n take, the central concept has non-trivial resources dedicated to it. This fixes the problem, except that r_i is still negative for log-i > average-of-log-i +r/n.

if you fix n and make r very large then you get approximately equal resources devoted to exploring each class. I have no idea why Vika regards that as a problem, though.

I agree, this is not a problem.

I agree: not constraining the ri to be non-negative is absurd. I apologize if I wasn't clear about that.

This fixes the problem, except that [...]

Or, to put it differently, it doesn't fix the problem :-). Roughly speaking avg log = log n - 1 so the point at which we start getting negative r's is somewhere near r=alpha.n.

If we clamp all the r's beyond rk to zero then the optimum r-vector has the same formula as before but with k instead of n everywhere. The resulting utility is obviously an increasing function of k (because it's an optimum over a space that's an increasing function of k) so the best we can do is to choose the biggest k that makes rk non-negative; that is, that makes r/k + alpha (avg log - log k) non-negative; that is, that makes k^k/k! <= exp(r/alpha). k^k/k! is fairly close to exp(k) so this is very roughly k <= r/alpha.

Or, to put it differently, it doesn't fix the problem :-).

Exactly.

If we clamp all the r's beyond rk to zero ...

Intuitively, this does seem to be the right sort of approach - either you bound everything, with a maximum utility and a minimum probability, or you bound nothing.

Intuitively, this does seem to be the right sort of approach

It's provably the right approach.

Let the allocation I described (with whatever choice of k optimizes the result) be R. Suppose it isn't globally optimal, and let R' be strictly better. R' may have infinitely many nonzero rj, but can in any case be approximated arbitrarily closely by an R'' with only finitely many nonzero rj; do so, closely enough that R'' is still strictly better than R. Well, having only finitely many nonzero rj, R'' is no better than one of my candidates and so in particular isn't better than R, contradiction.

It's provably the right approach.

I wasn't doubting your math, I was doubting the underlying assumption of a bounded utility function.

Of course, if we want to get technical, a finite computer can't store an infinite number of models of chocolate anyway.

I was doubting the underlying assumption of a bounded utility function.

I can defend that assumption: It is impossible for an expected utility maximizer to have an unbounded utility function, given only the assumption that the space of lotteries is complete. http://lesswrong.com/lw/gr6/vnm_agents_and_lotteries_involving_an_infinite/

I was doubting the underlying assumption

Oh, I see. OK.

a finite computer can't store an infinite number of models [...]

For sure. Nor, indeed, can our finite brains. (This is one reason why our actual utility functions, in so far as we have them, probably are bounded. Of course that isn't a good reason to use bounded utility functions in theoretical analyses unless all we're hoping to do is to understand the behaviour of a single human brain.)

First of all, the formula for r_i in the decaying-exponential case is wrong.

Thus, the resources will be evenly distributed among all the classes as r increases. This is bad, because the resource fraction for the central class C1 goes to 0 as we increase the number of classes.

I don't think this makes sense, for much the same reasons as given by skeptical_lurker.

You only get the even-distribution conclusion by (something like) fixing the number of classes as you let the total resources go to infinity. (Otherwise, the terms involving log(i) can make a large contribution.) But in that situation, your utility goes exponentially fast towards its upper bound of 1 and it's hard to see how that can be viewed as a bad outcome.

You might say it's a suboptimal outcome even though it's a good one, but to make that claim it seems to me you have to do an actual expected-utility calculation. And we know what that expected-utility calculation says: it says that the resource allocation you're objecting to is, in fact, the optimal one.

Or you might say it's a suboptimal outcome because you just know that this allocation is bad, or something. Which amounts to saying that actually you know what the utility function should be and it isn't the one the analysis assumes.

I have some sympathy with that last option. A utility function that not only is bounded but converges exponentially fast towards its bound feels pretty counterintuitive. It's not a big surprise, surely, if such a counterintuitive choice of utility function yields wrong-looking resource allocations?

If both n and r get large, under what circumstances is it still true that the resource allocation is approximately uniform? I suppose that depends on how you define "approximately uniform" but let's try looking at the ratio of r1 to r/n. If my scribbling is correct, this equals . When n is large this is (very crudely) of order n/r log n. So for any reasonably definition of "approximately uniform" this requires that r be growing at least proportionally to n. Eg., for the ratio to be much below log(n) we require r >= alpha n. And the expected utility is, if some more of my scribbling is correct, 1 - exp(-r/n)/n.G/A where G,A are the geometric and arithmetic means of (i^-alpha). So, e.g., the expected utility is at least 1-exp(-r/n)/n unconditionally (so at least 1-exp(-alpha)/n if r >= alpha n), and if alpha is much bigger than 0 (so as to make it in any way unreasonable for the ratio to be far from uniform) then the expected utility is bigger because GM/AM is smaller.

Let's take a very concrete example. Take the Zipfian alpha=1, and when n=100. So our probabilities for the classes are roughly 19%, 10%, 6%, 5%, ..., 0.2%. If we take r = alpha n = 100 then our resource allocation is 4.64, 3.94, 3.54, ..., 0.03. Perhaps that's "approximately uniform" but it certainly doesn't look shockingly so. The expected utilities conditional on the various classes are 0.99, 0.98, 0.97, ..., 0.032. The overall expected utility is 0.81.

That doesn't seem to me like an unreasonable or counterintuitive outcome, all things considered.

Hm. For cases where the utility function really is an exponential sigmoid, though, the resources devoted to the central hypothesis growing sublinearly in total resources actually seems like a good idea.

This is bad, because the resource fraction for the true class goes to 0 as we increase the number of classes.

I think this is an important point you make, but... you seem to be looking at r/n as both n and r go to infinity, which makes my brain hurt. Whether this is a problem seems to depend on how fast r and n go to infinity, as if, for instance, r=n^2 then as n goes to infinity, ri/r goes to 0 and ri goes to infinity. If an increase in physical resources leads to a greater than linear increase in concept space (due to having more processing power) then this would become problematic.

Also I am confused because at first C1 is described as the "central concept of chocolate" the concept most probable to represent actual chocolate, and later C1 is described as the "true class" as if the mode of a probability distribution has probability 1.

Fixed the sloppy terminology about the central / true concept, thanks for pointing it out!

See Addendum on asymptotics in the post.