How to measure optimisation power

Stuart_Armstrong

As every school child knows, an advanced AI can be seen as an optimisation process - something that hits a very narrow target in the space of possibilities. The Less Wrong wiki entry proposes some measure of optimisation power:

One way to think mathematically about optimization, like evidence, is in information-theoretic bits. We take the base-two logarithm of the reciprocal of the probability of the result. A one-in-a-million solution (a solution so good relative to your preference ordering that it would take a million random tries to find something that good or better) can be said to have log₂(1,000,000) = 19.9 bits of optimization.

This doesn't seem a fully rigorous definition - what exactly is meant by a million random tries? Also, it measures how hard it would be to come up with that solution, but not how good that solution is. An AI that comes up with a solution that is ten thousand bits more complicated to find, but that is only a tiny bit better than the human solution, is not one to fear.

Other potential measurements could be taking any of the metrics I suggested in the reduced impact post, but used in reverse: to measure large deviations from the status quo, not small ones.

Anyway, before I reinvent the coloured wheel, I just wanted to check whether there was a fully defined agreed upon measure of optimisation power.

One way to think mathematically about optimization, like evidence, is in information-theoretic bits. We take the base-two logarithm of the reciprocal of the probability of the result. A one-in-a-million solution (a solution so good relative to your preference ordering that it would take a million random tries to find something that good or better) can be said to have log₂(1,000,000) = 19.9 bits of optimization.

Other potential measurements could be taking any of the metrics I suggested in the reduced impact post, but used in reverse: to measure large deviations from the status quo, not small ones.

Anyway, before I reinvent the coloured wheel, I just wanted to check whether there was a fully defined agreed upon measure of optimisation power.

Can you make sense of Shane Legg's objection, then?

I would say that that the simple algorithm he describes has immense optimisation power. If there were a competitive situation, and other competent agents were trying to derail its goal, then its optimisation power drops close to zero. If your objection is that it's wrong to define a single "optimisation power" floating platonically above the agent, then I agree.

I think your objection shows that you failed to read (or appreciate) this bit:

You can quantify this, at least in theory, supposing you have (A) the agent or optimization process's preference ordering, and (B) a measure of the space of outcomes - which, for discrete outcomes in a finite space of possibilities, could just consist of counting them - then you can quantify how small a target is being hit, within how large a greater region.

His very next paragraph is:

Then we count the total number of states with equal or greater rank in the preference ordering to the outcome achieved, or integrate over the measure of states with equal or greater rank. Dividing this by the total size of the space gives you the relative smallness of the target - did you hit an outcome that was one in a million? One in a trillion?

"outcome achieved". Hence the optimisation is measuring how effective the agent is at implementing its agenda. An agent that didn't have the ressources to think well or fast enough would score low, because it wouldn't implement anything.

Then we count the total number of states with equal or greater rank in the preference ordering to the outcome achieved, or integrate over the measure of states with equal or greater rank. Dividing this by the total size of the space gives you the relative smallness of the target - did you hit an outcome that was one in a million? One in a trillion?

"outcome achieved". Hence the optimisation is measuring how effective the agent is at implementing its agenda. An agent that didn't have the ressources to think well or fast enough would score low,

... (read more)

13

How to measure optimisation power

13

13

13

How to measure optimisation power

13

13