How to measure optimisation power

Stuart_Armstrong

As every school child knows, an advanced AI can be seen as an optimisation process - something that hits a very narrow target in the space of possibilities. The Less Wrong wiki entry proposes some measure of optimisation power:

One way to think mathematically about optimization, like evidence, is in information-theoretic bits. We take the base-two logarithm of the reciprocal of the probability of the result. A one-in-a-million solution (a solution so good relative to your preference ordering that it would take a million random tries to find something that good or better) can be said to have log₂(1,000,000) = 19.9 bits of optimization.

This doesn't seem a fully rigorous definition - what exactly is meant by a million random tries? Also, it measures how hard it would be to come up with that solution, but not how good that solution is. An AI that comes up with a solution that is ten thousand bits more complicated to find, but that is only a tiny bit better than the human solution, is not one to fear.

Other potential measurements could be taking any of the metrics I suggested in the reduced impact post, but used in reverse: to measure large deviations from the status quo, not small ones.

Anyway, before I reinvent the coloured wheel, I just wanted to check whether there was a fully defined agreed upon measure of optimisation power.

One way to think mathematically about optimization, like evidence, is in information-theoretic bits. We take the base-two logarithm of the reciprocal of the probability of the result. A one-in-a-million solution (a solution so good relative to your preference ordering that it would take a million random tries to find something that good or better) can be said to have log₂(1,000,000) = 19.9 bits of optimization.

Other potential measurements could be taking any of the metrics I suggested in the reduced impact post, but used in reverse: to measure large deviations from the status quo, not small ones.

Anyway, before I reinvent the coloured wheel, I just wanted to check whether there was a fully defined agreed upon measure of optimisation power.

Then we count the total number of states with equal or greater rank in the preference ordering to the outcome achieved, or integrate over the measure of states with equal or greater rank. Dividing this by the total size of the space gives you the relative smallness of the target - did you hit an outcome that was one in a million? One in a trillion?

"outcome achieved". Hence the optimisation is measuring how effective the agent is at implementing its agenda. An agent that didn't have the ressources to think well or fast enough would score low, because it wouldn't implement anything.

The article talks about "preference ordering". There's no mention of how long the preference ordering takes to output. Resource constraints are missing from the whole article. It's optimisation power without consideration of resource limitation. Exhaustive search wins that contest - with the highest possible score.

Even if you factor in resource constraints (for example by imposing time and resource limits) - this is still a per-problem metric - while the term "optimisation power" suggests some more general capabilities.

"outcome achieved", "did you hit an outcome", "optimization processes produce surprises", "relative improbability of 'equally good or better' outcomes" - he's talking about the outcome produced (and then using the preference orderings to measure optimisation power given that that outcome was produced).

The time taken is not explicitly modeled, but is indirectly: exhaustive search only wins if the agent really has all the time in the world to implement its plans. An AI due to get smashed in a year if it doesn't produce anything will have an optimisation of zero if it uses exhaustive search.

13

How to measure optimisation power

13

13

13

How to measure optimisation power

13

13