This post starts with a very simple and retrospectively obvious observation:
If we want an AI to give us an estimation of expected utility, it needs to be motivated to give us that estimation.
Once we have that in mind, and remember that any extra motivation involves trade-offs, the points of the previous posts on truth-seeking become clearer.
Convexity and AI-chosen outputs
Let u be a utility known to range within R⊆R.
Let f be a twice differentiable function that is convex on R. For simplicity, we'll strengthen convexity to requiring f′′ be strictly positive on R. Then define:
u#(r)=f(r)+(u−r)f′(r), where r∈R is the output of the AI at some future time t.
For this post, we'll assume that r is not known by anyone but the AI (in future posts we'll look more carefully at allowing r to be known to us).
Differentiating u# with respect to r gives:
f′(r)+(u−r)f′′(r)−f′(r)=f′′(r)(u−r).
The expectation of this is zero iff r=Et(u). If we make that choice, notice that the expression is twice differentiable at r=Et(u) (even if f′′′ is not defined there!) and its derivative is simply −f′′(r), which is negative on R. Thus choosing r=Et(u) will maximise the AI's utility on R.
How much utility will the AI get? Since r will be set to the expectation of u at time t, clearly u#(r) will give utility f(Et(u)). At time 0, the AI's expectation of u# is therefore E0(f(Et(u)). If f were affine, this would simplify to f(E0(u)); but f is specifically not affine. Since f is convex, knowing more about the expected value of u can only (expect to) increase the expectation of u#. The AI values information.
Cost and truth-seeking
Consider first the function f(r)=r2 with R=[0,1]. That can be graphed as follows:
Here, we're imagining that E(u)=0.6, if the AI were a pure u-maximiser. In that situation, the expectation of u# is at least 0.36. Because of the convexity of f, however, an expectation of u of 0.6 can correspond to an expectation of u# of up to 0.6 as well (the red dotted line). For instance if u=1 with probability 0.6 and u=0 with probability 0.4 and the AI was going to know which was which before time t, then E0(u#)=E0(f(Et(u)))=0.6.
The green line connects (0.6,0.36) with (0.36,0.36). If the expectation of u ever falls below 0.36, then the expectation of u# must as well. Therefore, whatever the background situation, if E0(u)=0.6 for a u-maximising AI, then E0(u)≥0.36 for a u#-maximising AI (as the AI can guarantee that muh u# simply by being u-maximising). We've bounded the cost of the information the AI is giving us. The expectation of u must be in the range [0.36,0.6], at least initially (since a u#-maximising AI cannot maximise u better than a u-maximising one would).
Let's have a look at another example, for f(r)=exp(−r):
The function f is still convex, but here it is diminishing, and hence the AI desires to minimiseu. If the expectation of u under a u-minimising AI were 0.6, we can use the same trick as before: if the AI were instead u#-maximising, then the expectation of u won't be higher that the crossing point between the black line and the green dotted line. This occurs at roughly (0.71,0.55); thus the expectation of u is at most 0.71 if the AI was a u#-maximising agent rather than a u-minimising one. So the expectation of u is in the range [0.6,0.71].
But some choices of convex functions allow no such range restrictions. For instance, if we use the (increasing) function f(r)=−log(1−r), we have the following picture:
There is no black line connecting the ends of the curve, here, because f(r) has a pole at 1. The amount that a u#-maximiser can gain is infinite. The lottery 0.99(u=0)+0.01(u=1) has u-expectation of 0.01, but infinite u#-expectation. The AI is potentially capable of trading almost all the value of u away for its u#.
Another bad example is given by f(r)=0.5−r(0.75−r):
The problem here does not stem from any poles, but from the fact that f has a minimum on R. The green dotted line intersects the black line nowhere on R. Therefore the u#-maximising AI has no restrictions on what values of E(u) can show up.
Note that this f is a translated version of f(r)=r2, so it's clear the only reason that f(r)=r2 is "good" is because that function is increasing on the range of f.
Bounding the cost of information
In the previous, I took E(u)=0.6 as a given, but it's much more interesting to bound the loss to u from being a u#-maximiser, without knowing what the expectation of u would be.
Define the cost of information as the maximum expected divergence between the expectation of u given the agent is a u-maximiser, versus the same expectation, given the agent is a u#-maximiser.
There is a second cost we might consider, the cost of inaccuracy. This is the cost to the AI of getting r wrong. In an ideal world, the AI would always get it right (especially as it has to calculate the expectation for purposes of maximising uanyway), but it might be more sloppy if there's little return to getting it correct. Locally, the second derivative f′′(r) gives the cost of inaccuracy.
Note that the cost of information is the cost to u of having the AI not being a u-maximiser, while the cost of inaccuracy is the cost to u#(r) of outputting the wrong r. We want the second to be low while the first is high, but they will be in tension. This formalises the observation at the very beginning of this post.
There are three different cases to consider:
If u is bounded above AND below
If u is known to be bounded, then, by affine transformations, we may as well assume that R=[0,1]. Then consider the increasing convex function f(r)=ar2+(1−a)r. The curve of f will pass through (0,0) and (1,1); moreover, the maximum horizontal distance between this curve and the line joining those two end points is a/4. This maximal distance is achieved for r=0.5. Therefore, setting a to be small gives a minimum absolute cost of information of a/4.
Of course, if a is too small, then the AI has no real interest in getting r accurate - the cost of inaccuracy is too small. This cost is locally f′′(r)=2a, so is locally constant. This is why we can't just put a=0.
Example for a=0.05:
If u is bounded above OR below
If u is bounded below, we can translate it to R=[0,∞). Then set the increasing convex function f(r)=r+a/(r+1), 0<a<1. The whole curve is squeezed between the lines y=x+a and y=x. Therefore the maximal absolute cost of information is a. See the following example for a=0.05:
The local cost of inaccuracy is f′′(r)=2a/(r+1)3, which tends to 0 as r→∞. Therefore as the expectation of u rises, the AI is likely to become sloppier in giving the correct r.
There are no good global bounds, either. As long as the AI is willing to give up the a/(Et(u)+1) term in f, it can set r arbitrarily high while losing little.
A function like f(r)=r2 would control the cost of inaccuracy, but would increase the cost of information. It seems that functions like f(r)=r−alog(r+1) or f(r)=r−a√r+1log(r+1) might control both the relative cost of information and the relative cost of inaccuracy - relative meaning as a proportion of Et(u).
If u is bounded above, we can translate −u to R=[0,∞) and use the decreasing convex function f(r)=−r+a/(r+1), 0<a<1. Then the argument proceeds as before with y=−x and y=−x+a.
If u is unbounded
If u ranges over the whole of R, we can no longer bound the cost of information. The rough argument is that the limit of the slope of f at +∞ must be strictly greater than the one at −∞. Therefore there exists a>b such that f(x)>ax for large enough x and f(x)>bx for large enough −x.
Then consider the lottery where x>0, and the events u=x and u=−x each have probability 0.5 (and the AI will learn which happens before time t). Then the u-expectation of this lottery is 0, but the u#-expectation is greater than (a−b)x for large enough x. Letting x→+∞, means that the u#-expectation of this lottery is unbounded.
Thus for all r, there are situations where the AI would prefer a lottery of u-expectation 0, to a sure thing of u=r. Since 0 isn't meaningful either, a u#-maximising agent has no constraint on what values of u it might expect to get.
To get some constraints, we would have to add extra conditions, such as the likelihood of various lotteries. But this involves guessing what is and isn't possible for an AI to achieve.
A putative new idea for AI control; index here.
This post starts with a very simple and retrospectively obvious observation:
If we want an AI to give us an estimation of expected utility, it needs to be motivated to give us that estimation.
Once we have that in mind, and remember that any extra motivation involves trade-offs, the points of the previous posts on truth-seeking become clearer.
Convexity and AI-chosen outputs
Let u be a utility known to range within R⊆R.
Let f be a twice differentiable function that is convex on R. For simplicity, we'll strengthen convexity to requiring f′′ be strictly positive on R. Then define:
For this post, we'll assume that r is not known by anyone but the AI (in future posts we'll look more carefully at allowing r to be known to us).
Differentiating u# with respect to r gives:
The expectation of this is zero iff r=Et(u). If we make that choice, notice that the expression is twice differentiable at r=Et(u) (even if f′′′ is not defined there!) and its derivative is simply −f′′(r), which is negative on R. Thus choosing r=Et(u) will maximise the AI's utility on R.
How much utility will the AI get? Since r will be set to the expectation of u at time t, clearly u#(r) will give utility f(Et(u)). At time 0, the AI's expectation of u# is therefore E0(f(Et(u)). If f were affine, this would simplify to f(E0(u)); but f is specifically not affine. Since f is convex, knowing more about the expected value of u can only (expect to) increase the expectation of u#. The AI values information.
Cost and truth-seeking
Consider first the function f(r)=r2 with R=[0,1]. That can be graphed as follows:
Here, we're imagining that E(u)=0.6, if the AI were a pure u-maximiser. In that situation, the expectation of u# is at least 0.36. Because of the convexity of f, however, an expectation of u of 0.6 can correspond to an expectation of u# of up to 0.6 as well (the red dotted line). For instance if u=1 with probability 0.6 and u=0 with probability 0.4 and the AI was going to know which was which before time t, then E0(u#)=E0(f(Et(u)))=0.6.
The green line connects (0.6,0.36) with (0.36,0.36). If the expectation of u ever falls below 0.36, then the expectation of u# must as well. Therefore, whatever the background situation, if E0(u)=0.6 for a u-maximising AI, then E0(u)≥0.36 for a u#-maximising AI (as the AI can guarantee that muh u# simply by being u-maximising). We've bounded the cost of the information the AI is giving us. The expectation of u must be in the range [0.36,0.6], at least initially (since a u#-maximising AI cannot maximise u better than a u-maximising one would).
Let's have a look at another example, for f(r)=exp(−r):
The function f is still convex, but here it is diminishing, and hence the AI desires to minimise u. If the expectation of u under a u-minimising AI were 0.6, we can use the same trick as before: if the AI were instead u#-maximising, then the expectation of u won't be higher that the crossing point between the black line and the green dotted line. This occurs at roughly (0.71,0.55); thus the expectation of u is at most 0.71 if the AI was a u#-maximising agent rather than a u-minimising one. So the expectation of u is in the range [0.6,0.71].
But some choices of convex functions allow no such range restrictions. For instance, if we use the (increasing) function f(r)=−log(1−r), we have the following picture:
There is no black line connecting the ends of the curve, here, because f(r) has a pole at 1. The amount that a u#-maximiser can gain is infinite. The lottery 0.99(u=0)+0.01(u=1) has u-expectation of 0.01, but infinite u#-expectation. The AI is potentially capable of trading almost all the value of u away for its u#.
Another bad example is given by f(r)=0.5−r(0.75−r):
The problem here does not stem from any poles, but from the fact that f has a minimum on R. The green dotted line intersects the black line nowhere on R. Therefore the u#-maximising AI has no restrictions on what values of E(u) can show up.
Note that this f is a translated version of f(r)=r2, so it's clear the only reason that f(r)=r2 is "good" is because that function is increasing on the range of f.
Bounding the cost of information
In the previous, I took E(u)=0.6 as a given, but it's much more interesting to bound the loss to u from being a u#-maximiser, without knowing what the expectation of u would be.
Define the cost of information as the maximum expected divergence between the expectation of u given the agent is a u-maximiser, versus the same expectation, given the agent is a u#-maximiser.
There is a second cost we might consider, the cost of inaccuracy. This is the cost to the AI of getting r wrong. In an ideal world, the AI would always get it right (especially as it has to calculate the expectation for purposes of maximising u anyway), but it might be more sloppy if there's little return to getting it correct. Locally, the second derivative f′′(r) gives the cost of inaccuracy.
Note that the cost of information is the cost to u of having the AI not being a u-maximiser, while the cost of inaccuracy is the cost to u#(r) of outputting the wrong r. We want the second to be low while the first is high, but they will be in tension. This formalises the observation at the very beginning of this post.
There are three different cases to consider:
If u is bounded above AND below
If u is known to be bounded, then, by affine transformations, we may as well assume that R=[0,1]. Then consider the increasing convex function f(r)=ar2+(1−a)r. The curve of f will pass through (0,0) and (1,1); moreover, the maximum horizontal distance between this curve and the line joining those two end points is a/4. This maximal distance is achieved for r=0.5. Therefore, setting a to be small gives a minimum absolute cost of information of a/4.
Of course, if a is too small, then the AI has no real interest in getting r accurate - the cost of inaccuracy is too small. This cost is locally f′′(r)=2a, so is locally constant. This is why we can't just put a=0.
Example for a=0.05:
If u is bounded above OR below
If u is bounded below, we can translate it to R=[0,∞). Then set the increasing convex function f(r)=r+a/(r+1), 0<a<1. The whole curve is squeezed between the lines y=x+a and y=x. Therefore the maximal absolute cost of information is a. See the following example for a=0.05:
The local cost of inaccuracy is f′′(r)=2a/(r+1)3, which tends to 0 as r→∞. Therefore as the expectation of u rises, the AI is likely to become sloppier in giving the correct r.
There are no good global bounds, either. As long as the AI is willing to give up the a/(Et(u)+1) term in f, it can set r arbitrarily high while losing little.
A function like f(r)=r2 would control the cost of inaccuracy, but would increase the cost of information. It seems that functions like f(r)=r−alog(r+1) or f(r)=r−a√r+1log(r+1) might control both the relative cost of information and the relative cost of inaccuracy - relative meaning as a proportion of Et(u).
If u is bounded above, we can translate −u to R=[0,∞) and use the decreasing convex function f(r)=−r+a/(r+1), 0<a<1. Then the argument proceeds as before with y=−x and y=−x+a.
If u is unbounded
If u ranges over the whole of R, we can no longer bound the cost of information. The rough argument is that the limit of the slope of f at +∞ must be strictly greater than the one at −∞. Therefore there exists a>b such that f(x)>ax for large enough x and f(x)>bx for large enough −x.
Then consider the lottery where x>0, and the events u=x and u=−x each have probability 0.5 (and the AI will learn which happens before time t). Then the u-expectation of this lottery is 0, but the u#-expectation is greater than (a−b)x for large enough x. Letting x→+∞, means that the u#-expectation of this lottery is unbounded.
Thus for all r, there are situations where the AI would prefer a lottery of u-expectation 0, to a sure thing of u=r. Since 0 isn't meaningful either, a u#-maximising agent has no constraint on what values of u it might expect to get.
To get some constraints, we would have to add extra conditions, such as the likelihood of various lotteries. But this involves guessing what is and isn't possible for an AI to achieve.