Back in the day, Eliezer proposed a method for measuring the optimization power (OP) of a system S. The idea is to get a measure of small a target the system can hit:
You can quantify this, at least in theory, supposing you have (A) the agent or optimization process's preference ordering, and (B) a measure of the space of outcomes - which, for discrete outcomes in a finite space of possibilities, could just consist of counting them - then you can quantify how small a target is being hit, within how large a greater region.
Then we count the total number of states with equal or greater rank in the preference ordering to the outcome achieved, or integrate over the measure of states with equal or greater rank. Dividing this by the total size of the space gives you the relative smallness of the target - did you hit an outcome that was one in a million? One in a trillion?
Actually, most optimization processes produce "surprises" that are exponentially more improbable than this - you'd need to try far more than a trillion random reorderings of the letters in a book, to produce a play of quality equalling or exceeding Shakespeare. So we take the log base two of the reciprocal of the improbability, and that gives us optimization power in bits.
For example, assume there were eight equally likely possible states {X0, X1, ... , X7}, and S gives them utilities {0, 1, ... , 7}. Then if S can make X6 happen, there are two states better or equal to its achievement (X6 and X7), hence it has hit a target filling 1/4 of the total space. Hence its OP is log2 4 = 2. If the best S could manage is X4, then it has only hit half the total space, and has an OP of only log2 2 = 1. Conversely, if S reached the perfect X7, 1/8 of the total space, then it would have an OP of log2 8 = 3.
The system, the whole system, and everything else in the universe
Notice that OP is defined in terms of the state that S achieved (for the moment this will be a pure world, but later we'll allow probabilistically mixed worlds to be S's "achievement"). So it give us a measure of how powerful S is in practice in our model, not some platonic measure of how good S is in general situations. So an idiot king has more OP than a brilliant peasant; a naive search algorithm distributed across the internet has more OP than a much better program running on Colossus. This does not seem a drawback to OP: after all, we want to measure how powerful a system actually is, not how powerful it could be in other circumstances.
Similarly, OP measures the system's ability to achieve its very top goals, not how hard these goals are. A system that wants to compose a brilliant sonnet has more OP than exactly the same system that wants to compose a brilliant sonnet while embodied in the Andromeda galaxy. Even though the second is plausibly more dangerous. So OP is a very imperfect measure of how powerful a system is.
We could maybe extend this to some sort of "opposed OP": what is the optimization power of S, given that humans want to stop it from achieving its goals? But even there, a highly powerful system with nearly un-achievable goals will still have a very low opposed OP. Maybe the difference between the opposed OP and the standard OP is a better measure of power.
As pointed out by Tim Tyler, OP can also increase if we change the size of the solution space. Imagine an agent that has to print out a non-negative integer N, and whose utility is -N. The agent will obviously print 0, but if the printer is limited to ten digit numbers, its OP is smaller than if the printer is limited to twenty digit numbers: though the solution is just as easy and obvious, the number of ways it "could have been worse" is increased, increasing OP.
Is it OP an entropy? Is it defined for mixed states?
In his post Eliezer makes a comparison between OP and entropy. And OP does have some of the properties of entropy: for instance if S is optimizing two separate independent processes (and its own utility treats them as independent), then its OP is the sum of the OP for each process. If for instance S hit an area of 1/4 in the first process (OP 2) and 1/8 in the second (OP 3), then it hits an area of 1/(4*8)=1/32 for the joint processes, for an OP of 5. This property, incidentally, is what allows us to talk about "the" entropy of an isolated system, without worrying about the rest of the universe.
But now imagine that our S in the first example can't be sure to hit a pure state, but has 50% chance of hitting X7 and 50% of hitting X4. If OP were an entropy, then we'd simply do a weighted sum 1/2(OP(X4)+OP(X7))=1/2(1+3)=2, and then add one extra bit of entropy to represent our (binary) uncertainty as to what state we were in, giving a total OP of 3. But this is the same OP as X7 itself! And obviously a 50% of X7 and 50% of something inferior cannot be as good as a certainty of the best possible state. So unlike entropy, mere uncertainty cannot increase OP.
So how should OP extend to mixed states? Can we write a simple distributive law:
OP(1/2 X4 + 1/2 X7) = 1/2(OP(X4) + OP(X7)) = 2?
It turns out we can't. Imagine that, without changing anything else, the utility of X7 is suddenly set to ten trillion, rather than 7. The OP of X7 is still 3 - it's still the best option, still with probability 1/8. And yet 1/2 X4 + 1/2 X7 is now obviously much, much better than X6, which has an OP of 2. But now let's reset X6 to being ten trillion minus 1. Then it still has a OP of 2, and yet is now much much better than 1/2 X4 + 1/2 X7.
But I may have been unfair in those examples. After all, we're looking at mixed states, and X6 need not have a fixed OP of 2 in the space of mixed states. Maybe if we looked at the simplex formed by all mixed states made up of {X0, X1, ... , X7}, we could get these results to work? Since all Xi are equally likely, we'd simply put a uniform measure on that simplex. But now we run into another problem: the OP of X7 has suddenly shot up to infinity! After all, X7 is now an event of probability zero, better than any other outcome; the log2 of the inverse of its probability is infinity. Even if we just restrict to a tiny, non-zero area, around X7, we get arbitrarily high OP - it's not a fluke or a calculation error. Which means that if we followed the distributive law, Q=(1-10-1000) X0 + 10-1000 X7 must have a much larger OP than X6 - despite the fact that nearly every possible outcome is better than Q.
So it seems that unlike entropy, OP cannot have anything resembling a distributive law. The set of possible outcomes that you started with - including any possible mixed outcomes that S could cause - is what you're going to have to use. This sits uncomfortably with the whole Bayesian philosophy - after all, there mixed states shouldn't represent anything but uncertainty between pure states. They shouldn't be listed as separate outcomes.
Measures and coarse-graining
In the previous section, we moved from using a finite set of equally likely outcomes, to a measure over a simplex of mixed outcomes. This is the natural generalisation of OP: simply compute the probability measure of the states better than what S achieves, and use the log2 of the inverse of this measure as OP.
Some of you may have spotted the massive elephant in the room, whose mass twists space and underlines and undermines the definition of OP. What does this probability measure actually represent? Eliezer saw it in his original post:
The quantity we're measuring tells us how improbable this event is, in the absence of optimization, relative to some prior measure that describes the unoptimized probabilities.
Or how could I write "there were eight equally likely possible states" and "S can make X6 happen"? Well, obviously, what I meant was that if S didn't exist, then it would be equally likely that X7 and X6 and X5 and X4 and...
But wait! These Xi's are final states of the world - so they include the information as to whether S existed in them or not. So what I'm actually saying is that {X0(¬S), X1(¬S), ... , X7(¬S)} (the worlds with no S) are equally likely, whereas Xi(S) (the worlds with S) are impossible for i≠6. But what has allowed me to identify X0(¬S) with X0(S)? I'm claiming they're the same world "apart from S" but what does this mean? After all, S can have huge impacts, and X0(S) is actually an impossible world! So I'm saying that "there two worlds are strictly the same, apart that S exists in one of them, but them again, S would never allow that world to happen if it did exist, so, hum..."
Thus it seems that we need to use some sort of coarse-graining to identify Xi(¬S) with Xi(S), similar to those I speculated on in the reduced impact post.
A concept I've played with, coming off of Eliezer's initial take on the problem of formulating optimization power, is: Suppose something generated N options randomly and then chose the best. Given the observed choice, what is the likelihood function for N?
For continuously distributed utilities, this can be computed directly using beta distributions. Beta(N, 1) is the probability density for the highest of N uniformly distributed unit random numbers. This includes numbers which are cumulative probabilities for a continuous distribution at values drawn from that distribution, and therefore numbers which are cumulative probabilities at the goodness of an observed choice. (N doesn't have to be an integer, because beta distributions are defined for non-integer parameters.)
(The second step of this construction, where you attach a beta distribution to another distribution's CDF, I had to work out by myself; it's not directly mentioned in any discussions of extreme value statistics that I could find. The Mathworld page on order statistics, one step of generalization away, uses the formula for a beta CDF transformed by another CDF, but it still doesn't refer to beta distributions by name.)
If the utilities are discretely distributed, you have to integrate the beta density over the interval of cumulative probabilities that invert to the observed utility.
To handle choices of mixtures, I guess you could modify this slightly, and ask about the likelihood function for N given the observed outcome, marginalizing over (as User:Kindly also suggests) the (unobserved) choice of option. This requires a distribution over options and a conditional distribution over observations given options. This would also cover situations with composite options where you only observe one of the aspects of the chosen option.
Opposed optimization might be very crudely modeled by increasing the number on the opposite side of the beta distribution from N. Somewhat related to this is Warren Smith's "Fixed Point for Negamaxing Probability Distributions on Regular Trees", which examines the distributions of position values that result when two opponents take turns choosing the worst option for each other.
Alternatively, instead of a likelihood function for N, you could have a likelihood function for an exponential weighting λ on the expected utility of the option:
Pr(A was chosen)/Pr(B was chosen) ∝ exp(λ(U(A)-U(B))).
Higher values of λ would be hypotheses in which better options were more strongly randomly selected over worse ones. (This is something like a logit-response model, for which λ (or "β", or "1/μ") would be the "rationality" parameter. It might be more familiar as a Gibbs distribution.) But this would fail when the expected utility from the null distribution was heavy-tailed, because then for some λ≠0 the distribution of optimized expected utilities would be impossible to normalize. Better would be for the cumulative probability at the expected utility of the chosen option to be what was exponentially weighted by λ. In that case, in the limit λ = (N-1) >> 1, the two models give the same distribution.
All of these statistics, as well as Eliezer's original formulation, end up encoding equivalent information in the limit where the utility of each option is an independent sum of many identical light-tailed-distributed components and you're predicting a marginal distribution of utilities for one of those components. In this limit you can safely convert everything to a statistical mechanics paradigm and back again.
Of course, the real criterion for a good formulation of optimization power is whether it helps people who use it in an argument about things that might be optimizers, or who hear it used in such an argument, to come to truthful conclusions.
In this respect, likelihood functions can have the problem that most people won't want to use them: they're hard to compute with, or communicate, unless they belong to a low-dimensional family. The likelihood functions I suggested won't do that except under very simple conditions. I'm not sure what the best way would be to simplify them to something lower-dimensional. I guess you could just communicate a maximum-likelihood estimate and precision for the optimization power parameter. Or, if you chose a reference prior over optimization power, you could communicate its posterior mean and variance.
All of this presupposes that the problems with the unoptimized probability measure can be dealt with. Maybe it would work better to describe the optimization power of a system in terms of a series of levels of simpler systems leading up to that system, where each level's new amount of optimization was only characterized approximately, and only relative to something like a distribution of outputs from the previous level. (This would at least patch the problem where, if thermodynamics is somehow involved in the results of an action, that action can count as very powerful relative to the uniform measure over the system's microstates.) If optimizer B is sort of like choosing the best of N options generated by optimizer A, and optimizer C is sort of like choosing the best of M options generated by optimizer B, that might not have to mean that optimizer C is much like choosing the best of N*M options generated by optimizer A.