This is excellent, it feels way better as a definition of optimization than past attempts :) Thanks in particular for the academic style, specifically relating it to previous work, it made it much more accessible for me.
Let's try to build up some core AI alignment arguments with this definition.
Task: A task is simply an “environment” along with a target configuration set. Whenever I talk about a “task” below, assume that I mean an “interesting” task, i.e. something like “build a chair”, as opposed to “have the air molecules be in one of these particular configurations”.
Solving a task: An object O solves a task T if adding O to T’s environment transforms it into an optimizing system for the T’s target configuration set.
Performance on the task: If O solves task T, its performance is quantified by how quickly it reaches the target configuration set, and how robust it is to perturbations.
Generality of intelligence: The generality of O’s intelligence is a function of the number and diversity of tasks T that it can solve, as well as its performance on those tasks.
Optimizing AI: A computer prog...
I shared this essay with a colleague where I work (Johns Hopkins University Applied Physics Lab). Here are her comments, which she asked me to share:
This essay proposes a very interesting definition of optimization as the manifestation of a particular behavior of a closed, physical system. I haven’t finished thinking this over, but I suspect it will be (as is suggested in the essay) a useful construct. The reasoning leading to the definition is clearly laid out (thank you!), with examples that are very useful in understanding the concept. The downside of being clearly laid out, however, is that it makes critique easier. I have a few thoughts about the reasoning in the essay.
The first thing I will note is that the essay gives three definitions for an optimizing system. These definitions are close, but not exactly equivalent. The nuances can be important. For example, that the target configuration set and the basin of attraction cannot be equal is obvious; that is made explicit in definition 3, but only implied in definitions 1 and 2. A bigger issue is that there are no criteria or rationale for their extent and relative size.
For example, the essay offers two reasons why the ...
In this post, the author proposes a semiformal definition of the concept of "optimization". This is potentially valuable since "optimization" is a word often used in discussions about AI risk, and much confusion can follow from sloppy use of the term or from different people understanding it differently. While the definition given here is a useful perspective, I have some reservations about the claims made about its relevance and applications.
The key paragraph, which summarizes the definition itself, is the following:
An optimizing system is a system that has a tendency to evolve towards one of a set of configurations that we will call the target configuration set, when started from any configuration within a larger set of configurations, which we call the basin of attraction, and continues to exhibit this tendency with respect to the same target configuration set despite perturbations.
In fact, "continues to exhibit this tendency with respect to the same target configuration set despite perturbations" is redundant: clearly as long as the perturbation doesn't push the system out of the basin, the tendency must continue.
This is what is known as "attractor" in dynamical systems the...
My biggest objection to this definition is that it inherently requires time. At a bare minimum, there needs to be an "initial state" and a "final state" within the same state space, so we can talk about the system going from outside the target set to inside the target set.
One class of cases which definitely seem like optimization but do not satisfy this property at all: one-shot non-iterative optimization. For instance, I could write a convex function optimizer which works by symbolically differentiating the objective function and then algebraically solving for a point at which the gradient is zero.
Is there an argument that I should not consider this to be an optimizer?
My biggest objection to this definition is that it inherently requires time
Fascinating - but why is this an objection? Is it just the inelegance of not being able to look at a single time slice and answer the question of whether optimization is happening?
One class of cases which definitely seem like optimization but do not satisfy this property at all: one-shot non-iterative optimization.
Yes this is a fascinating case! I'd like to write a whole post about it. Here are my thoughts:
This seems great, I'll read and comment more thoroughly later. Two quick comments:
It didn't seem like you defined what it meant to evolve towards the target configuration set. So it seems like either you need to commit to the system actually reaching one of the target configurations to call it an optimiser, or you need some sort of metric over the configuration space to tell whether it's getting closer to or further away from the target configuration set. But if you're ranking all configurations anyway, then I'm not sure it adds anything to draw a binary distinction between target configurations and all the others. In other words, can't you keep the definition in terms of a utility function, but just add perturbations?
Also, you don't cite Dennett here, but his definition has some important similarities. In particular, he defines several different types of perturbation (such as random perturbations, adversarial perturbations, etc) and says that a system is more agentic when it can withstand more types of perturbations. Can't remember exactly where this is from - perhaps The Intentional Stance?
Curated. Come on dude, stop writing so many awesome posts so quickly, it's too much.
This is a central question in the science of agency and optimization. The proposal is simple, you connected it to other ideas from Drexler and Demski+Garrabrant, and you gave a ton of examples of how to apply the idea. I generally get scared by the academic style, worried that the authors will fill out the text and make it really hard to read, but this was all highly readable, and set its own context (re-explaining the basic ideas at the start). I'm looking forward to you discussing it in the comments with Ricraz, Rohin and John.
Please keep writing these posts!
Thanks for the post, this is my favourite formalisation of optimisation so far!
One concern I haven't seen raised so far, is that the definition seems very sensitive to the choice of configuration space. As an extreme example, for any given system, I can always augment the configuration space with an arbitrary number of dummy dimensions, and choose the dynamics such that these dummy dimensions always get set to all zero after each time step. Now, I can make the basin of attraction arbitrarily large, while the target configuration set remains a fixed size. This can then make any such dynamical system seem to be an arbitrarily powerful optimiser.
This could perhaps be solved by demanding the configuration space be selected according to Occam's razor, but I think the outcome still ends up being prior dependent. It'd be nice for two observers who model optimising systems in a systematically different way to always agree within some constant factor, akin to Kolmogorov complexity's invariance theorem, although this may well be impossible.
As a less facetious example, consider a computer program that repeatedly sets a variable to 0. It seems again we can make the optimisi...
Two examples which I'd be interested in your comments on:
1. Consider adding a big black hole in the middle of a galaxy. Does this turn the galaxy into a system optimising for a really big black hole in the middle of the galaxy? (Credit for the example goes to Ramana Kumar).
2. Imagine that I have the goal of travelling as fast as possible. However, there is no set of states which you can point to as the "target states", since whatever state I'm in, I'll try to go even faster. This is another argument for, as I argue below, defining an optimising system in terms of increasing some utility function (rather than moving towards target states).
This seems like a good definition of optimization for algorithmic systems, but I don't see how it works for physical systems. Going by the primary definition,
An optimizing system is a system that has a tendency to evolve towards one of a set of configurations that we will call the target configuration set, when started from any configuration within a larger set of configurations, which we call the basin of attraction.
But in the physical world, there are literally zero closed systems with this property. Entropy always increases*, and the target configuration set will never be smaller than the basin of attraction. The dirt-plus-seed-plus-sunlight system has a vastly smaller configuration space than the dirt-plus-tree-plus-heat system. Perhaps one could object that one should discount the incoming sunlight and outgoing heat since the system isn't really closed, but then consider a very similar system consisting of only dirt, air, and fungal spores. Surely if a growing tree is an optimizing system, then a growing mushroom in a closed system is an optimizer too. But the entropy increase in the latter case is unambiguous: the number of ways to arrange atoms into a fully grown m...
I think this is great.
I would want to relate it to a few key points out which I tried to address in a few earlier posts. Principally, I discussed selection versus control, which is about the difference between what optimization does externally, and how it uses models and testing. This related strongly to your conception of an optimizing system, but focused on how much of the optimization process occurs in the system versus in the agent itself. This is principally important because of how it relates to misalignment and Goodharting of various types.
I had hopes to further apply that conceptual model to meas-optimization, but I was a bit unsure how to think about it, and have been working on other projects. At this point, I think your discussion is probably a better conceptual model than the one I was trying to build there - it just needs to be slightly extended to cover the points I was trying to work out in those posts. I'd like to think about how it relates to mesa-optimization as well, but I'm unlikely to actually work on that
This is excellent! Very well done, I would love to see more work like this.
I have a whole bunch of things to say along separate directions so I'll break them into separate comments. This first one is just a couple minor notes:
Planned summary for the Alignment Newsletter:
Many arguments about AI risk depend on the notion of “optimizing”, but so far it has eluded a good definition. One natural approach is to say that an optimizer causes the world to have higher values according to some reasonable utility function, but this seems insufficient, as then a <@bottle cap would be an optimizer@>(@Bottle Caps Aren't Optimisers@) for keeping water in the bottle.
This post provides a new definition of optimization, by taking a page from <@Embedded Agents@> and...
This post reminds me of thinking from 1950s when people taking inspiration from Wiener's work on cybernetics tried to operationalize "purposeful behavior" in terms of robust convergence to a goal state:
> When an optimizing system deviates beyond its own rim, we say that it dies. An existential catastrophe is when the optimizing system of life on Earth moves beyond its own outer rim.
I appreciate the direct attention to this process a...
The set of optimizing systems is smaller than the set of all AI services, but larger than the set of goal-directed agentic systems.
...
A tree is an optimizing system but not a goal-directed agent system.
I'm not sure this is true, at least not in the sense that we usually think about "goal-directed agent systems".
You make a case that there's no distinct subsystem of the tree which is "doing the optimizing", but this isn't obviously relevant to whether the tree is agenty. For instance, the tree presumably still needs to model its environment to some extent, a
...At first I particularly liked the idea of identifying systems with "an optimizer" as those which are robust to changes in the object of optimization, but brittle with respect to changes in the engine of optimization.
On reflection, it seems like a useful heuristic but not a reliable definition. A counterexample: suppose we do manage to build a robust AI which maximizes some utility function. One desirable property of such an AI is that it's robust to e.g. one of its servers going down or corrupted data on a hard drive; the AI itself should be robust to as m
...An optimizing system is a system that has a tendency to evolve towards one of a set of configurations that we will call the target configuration set, when started from any configuration within a larger set of configurations, which we call the basin of attraction, and continues to exhibit this tendency with respect to the same target configuration set despite perturbations.
First, I want to say that I think your definition says something important.
That said, I'm concerned that the above definition would have some potentially problematic false negatives. I...
Truly a joy to read! Thank you.
To what extent can we identify subsets of the system corresponding to "that which is being optimized" and "that which is doing the optimization"?
The information theoretic measure of individuality attempts to answer exactly this type of question.
From this view, a set of components (the system) is decomposed into two subsets (subsystem + environment). The proposed subsystem is assigned a degree of individuality by measuring the amount of information it shares with its future state, optionally conditioned o...
systems that have a tendency to evolve towards a narrow target configuration set when started from any point within a broader basin of attraction, and continue to do so despite perturbations.
When determining whether a system "optimizes" in practice, the heavy lifting is done by the degree to which the set of states that the system evolves toward -- the suspected "target set" -- feels like it forms a natural class to the observer.
The issue here is that what the observer considers "a natural class" is informed by the data-distribution that the observer has previously been exposed to.
Is a metal bar an optimizer? Looking at the temperature distribution, there is a clear set of target states (states of uniform temperature) with a much larger basin of attraction (all temperature distributions that don't vaporize the bar).
I suppose we could consider the second law of thermodynamics to be the true optimizer in this case. The consequence is that any* closed physical system is trivially an optimizing system towards higher entropy.
In general, it seems like this optimization criterion is very easy to satisfy if we don't specify what...
But Filan would surely agree on this point and his question is more specific: he is asking whether the liver is an optimizer.
FYI, it seems pretty clear to me that a liver should be considered an optimiser: as an organ in the human body, it performs various tasks mostly reliably, achieves homeostasis, etc. The question I was rhetorically asking was whether it is an optimiser of one's income, and the answer (I claim) is 'no'.
the exact same answer it would have output without the perturbation.
It always gives the same answer for the last digit?
An interesting point about the agency-as-retargetable-optimisation idea is that it seems like you can make the perturbation in various places upstream of the agent's decision-making, but not downstream, i.e. you can retarget an agent by perturbing its sensors more easily than its actuators.
For example, to change a thermostat-controlled heating system to optimise for a higher temperature, the most natural perturbation might be to turn the temperature dial up, but you could also tamper with its thermistor so that it reports lower temperatures. On the other h...
You said your definition would not classify a bottle cap with water in it as an optimizer. This might be really nit-picky, but I'm not sure it's generally true.
I say this because the water in the bottle cap could evaporate. Thus, supposing there is no rain, from a wide range of possible states of the bottle cap, it would tend towards no longer having water in it.
I know you said you make an exception for tendencies towards increased entropy being considered optimizers. However, this does not increase the entropy of the bottlecap, It could potentially increa...
An optimizing system is a system that has a tendency to evolve towards one of a set of configurations that we will call the target configuration set, when started from any configuration within a larger set of configurations, which we call the basin of attraction, and continues to exhibit this tendency with respect to the same target configuration set despite perturbations.
If I'm reasoning correctly, I think this definition could classify just about anything as an optimizer.
Consider inanimate biological substances, like a leaf. From a wide range of initial ...
Great post.
I'm not keen on the requirement that the basin of attraction be strictly larger than the target configuration set. I don't think this buys you much, and seems to needlessly rule out goals based on narrow maintenance of some status-quo. Switching to a utility function as suggested by others improves things, I think.
For example: a highly capable AI whose only goal is to maintain a chess set in a particular position for as long as possible, but not to care about it after it's disturbed.
Here the target set is identical to the basin of...
This work was supported by OAK, a monastic community in the Berkeley hills. This document could not have been written without the daily love of living in this beautiful community. The work involved in writing this cannot be separated from the sitting, chanting, cooking, cleaning, crying, correcting, fundraising, listening, laughing, and teaching of the whole community.
What is optimization? What is the relationship between a computational optimization process — say, a computer program solving an optimization problem — and a physical optimization process — say, a team of humans building a house?
We propose the concept of an optimizing system as a physically closed system containing both that which is being optimized and that which is doing the optimizing, and defined by a tendency to evolve from a broad basin of attraction towards a small set of target configurations despite perturbations to the system. We compare our definition to that proposed by Yudkowsky, and place our work in the context of work by Demski and Garrabrant’s Embedded Agency, and Drexler’s Comprehensive AI Services. We show that our definition resolves difficult cases proposed by Daniel Filan. We work through numerous examples of biological, computational, and simple physical systems showing how our definition relates to each.
Introduction
In the field of computer science, an optimization algorithm is a computer program that outputs the solution, or an approximation thereof, to an optimization problem. An optimization problem consists of an objective function to be maximized or minimized, and a feasible region within which to search for a solution. For example we might take the objective function (x2−2)2 as a minimization problem and the whole real number line as the feasible region. The solution then would be x=√2 and a working optimization algorithm for this problem is one that outputs a close approximation to this value.
In the field of operations research and engineering more broadly, optimization involves improving some process or physical artifact so that it is fit for a certain purpose or fulfills some set of requirements. For example, we might choose to measure a nail factory by the rate at which it outputs nails, relative to the cost of production inputs. We can view this as a kind of objective function, with the factory as the object of optimization just as the variable x was the object of optimization in the previous example.
There is clearly a connection between optimizing the factory and optimizing for x, but what exactly is this connection? What is it that identifies an algorithm as an optimization algorithm? What is it that identifies a process as an optimization process?
The answer proposed in this essay is: an optimizing system is a physical process in which the configuration of some part of the universe moves predictably towards a small set of target configurations from any point in a broad basin of optimization, despite perturbations during the optimization process.
We do not imagine that there is some engine or agent or mind performing optimization, separately from that which is being optimized. We consider the whole system jointly — engine and object of optimization — and ask whether it exhibits a tendency to evolve towards a predictable target configuration. If so, then we call it an optimizing system. If the basin of attraction is deep and wide then we say that this is a robust optimizing system.
An optimizing system as defined in this essay is known in dynamical systems theory as a dynamical system with one or more attractors. In this essay we show how this framework can help to understand optimization as manifested in physically closed systems containing both engine and object of optimization.
In this way we find that optimizing systems are not something that are designed but are discovered. The configuration space of the world contains countless pockets shaped like small and large basins, such that if the world should crest the rim of one of these pockets then it will naturally evolve towards the bottom of the basin. We care about them because we can use our own agency to tip the world into such a basin and then let go, knowing that from here on things will evolve towards the target region.
All optimization basins have a finite extent. A ball may roll to the center of a valley if initially placed anywhere within the valley, but if it is placed outside the valley then it will roll somewhere else entirely, or perhaps will not roll at all. Similarly, even a very robust optimizing system has an outer rim to its basin of attraction, such that if the configuration of the system is perturbed beyond that rim then the system no longer evolves towards the target that it once did. When an optimizing system deviates beyond its own rim, we say that it dies. An existential catastrophe is when the optimizing system of life on Earth moves beyond its own outer rim.
Example: computing the square root of two
Say I ask my computer to compute the square root of two, for example by opening a python interpreter and typing:
The value printed here is actually calculated by solving an optimization problem. It works roughly as follows. First we set up an objective function that has as its minimum value the square root of two. One function we could use is y=(x2−2)2
Next we pick an initial estimate for the square root of two, which can be any number whatsoever. Let’s take 1.0 as our initial guess. Then we take a gradient step in the direction indicated by computing the slope of the objective function at our initial estimate:
Then we repeat this process of computing the slope and updating our estimate over and over, and our optimization algorithm quickly converges to the square root of two:
This is gradient descent, and it can be implemented in a few lines of python code:
But this program has the following unusual property: we can modify the variable that holds the current estimate of the square root of two at any point while the program is running, and the algorithm will still converge to the square root of two. That is, while the code above is running, if I drop in with a debugger and overwrite the current estimate while the loop is still executing, what will happen is that the next gradient step will start correcting for this perturbation, pushing the estimate back towards the square root of two:
If we give the algorithm time to converge to within machine precision of the actual square root of two then the final output will be bit-for-bit identical to the result we would have gotten without the perturbation.
Consider this for a moment. For most kinds of computer code, overwriting a variable while the code is running will either have no effect because the variable isn’t used, or it will have a catastrophic effect and the code will crash, or it will simply cause the code to output the wrong answer. If I use a debugger to drop in on a webserver servicing an http request and I overwrite some variable with an arbitrary value just as the code is performing a loop in which this variable is used in a central way, bad things are likely to happen! Most computer code is not robust to arbitrary in-flight data modifications.
But this code that computes the square root of two is robust to in-flight data modifications, or at least the "current estimate" variable is. It’s not that our perturbation has no effect: if we change the value, the next iteration of the algorithm will compute the objective function and its slope at a completely different point, and each iteration after that will be different to how it would have been if we hadn’t intervened. The perturbation may change the total number of iterations before convergence is reached. But ultimately the algorithm will still output an estimate of the square root of two, and, given time to fully converge, it will output the exact same answer it would have output without the perturbation. This is an unusual breed of computer program indeed!
What is happening here is that we have constructed a physical system consisting of a computer and a python program that computes the square root of two, such that:
for a set of starting configurations (in this case the set of configurations in which the "current estimate" variable is set to each representable floating point number),
the system exhibits a tendency to evolve towards a small set of target configurations (in this case just the single configuration in which the "current estimate" variable is set to the square root of two),
and this tendency is robust to in-flight perturbations to the system’s configuration (in this case robustness is limited to just the dimensions corresponding to changes in the "current estimate" variable).
In this essay I argue that systems that converge to some target configuration, and will do so despite perturbations to the system, are the systems we should rightly call "optimizing systems".
Example: building a house
Consider a group of humans building a house. Let us consider the humans together with the building materials and construction site as a single physical system. Let us imagine that we assemble this system inside a completely closed chamber, including food and sleeping quarters for the humans, lighting, a power source, construction materials, construction blueprint, as well as the physical humans with appropriate instructions and incentives to build the house. If we just put these physical elements together we get a system that has a tendency to evolve under the natural laws of physics towards a configuration in which there is a house matching the blueprint.
We could perturb the system while the house is being built — say by dropping in at night and removing some walls or moving some construction materials about — and this physical system will recover. The team of humans will come in the next day and find the construction materials that were moved, put in new walls to replace the ones that were removed, and so on.
Just like the square root of two example, here is a physical system with:
A basin of attraction (all the possible arrangements of viable humans and building materials)
A target configuration set that is small relative to the basin of attraction (those in which the building materials have been arranged into a house matching the design)
A tendency to evolve towards the target configurations when starting from any point within the basin of attraction, despite in-flight perturbations to the system
Now this system is not infinitely robust. If we really scramble the arrangement of atoms within this system then we’ll quickly wind up with a configuration that does not contain any humans, or in which the building materials are irrevocably destroyed, and then we will have a system without the tendency to evolve towards any small set of final configurations.
In the physical world we are not surprised to find systems that have this tendency to evolve towards a small set of target configurations. If I pick up my dog while he is sleeping and move him by a few inches, he still finds his way to his water bowl when he wakes up. If I pull a piece of bark off a tree, the tree continues to grow in the same upward direction. If I make a noise that surprises a friend working on some math homework, the math homework still gets done. Systems that contain living beings regularly exhibit this tendency to evolve towards target configurations, and tend to do so in a way that is robust to in-flight perturbations. As a result we are familiar with physical systems that have this property, and we are not surprised when they arise in our lives.
But physical systems in general do not have the tendency to evolve towards target configurations. If I move a billiard ball a few inches to the left while a bunch of billiard balls are energetically bouncing around a billiard table, the balls are likely to come to rest in a very different position than if I had not moved the ball. If I change the trajectory of a satellite a little bit, the satellite does not have any tendency to move back into its old orbit.
The computer systems that we have built are still, by and large, more primitive than the living systems that we inhabit, and most computer systems do not have the tendency to evolve robustly towards some set of target configurations, so optimization algorithms as discussed in the previous section, which do have this property, are somewhat unusual.
Defining optimization
An optimizing system is a system that has a tendency to evolve towards one of a set of configurations that we will call the target configuration set, when started from any configuration within a larger set of configurations, which we call the basin of attraction, and continues to exhibit this tendency with respect to the same target configuration set despite perturbations.
Some systems may have a single target configuration towards which they inevitably evolve. Examples are a ball in a steep valley with a single local minimum, and a computer computing the square root of two. Other systems may have a set of target configurations and perturbing the system may cause it to evolve towards a different member of this set. Examples are a ball in a valley with multiple local minima, or a tree growing upwards (perturbing the tree by, for example, cutting off some branches while it is growing will probably change its final shape, but will not change its tendency to grow towards one of the configurations in which it has reached its maximum size).
We can quantify optimizing systems in the following ways.
Robustness. Along how many dimensions can we perturb the system without altering its tendency to evolve towards the target configuration set? What magnitude perturbation can the system absorb along these dimensions? A self-driving car navigating through a city may be robust to perturbations that involve physically moving the car to a different position on the road in the city, but not to perturbations that involve changing the state of physical memory registers that contain critical bits of computer code in the car’s internal computer.
Duality. To what extent can we identify subsets of the system corresponding to "that which is being optimized" and "that which is doing the optimization"? Between engine and object of optimization; between agent and world. Highly dualistic systems may be robust to perturbations of the object of optimization, but brittle with respect to perturbations of the engine of optimization. For example, a system containing a 2020s-era robot moving a vase around is a dualistic optimizing system: there is a clear subset of the system that is the engine of optimization (the robot), and object of optimization (the vase). Furthermore, the robot may be able to deal with a wide variety of perturbations to the environment and to the vase, but there are likely to be numerous small perturbations to the robot itself that will render it inert. In contrast, a tree is a non-dualistic optimizing system: the tree does grow towards a set of target configurations, but it makes no sense to ask which part of the tree is "doing" the optimization and which part is "being" optimized. This latter example is discussed further below.
Retargetability. Is it possible, using only a microscopic perturbation to the system, to change the system such that it is still an optimizing system but with a different target configuration set? A system containing a robot with the goal of moving a vase to a certain location can be modified by making just a small number of microscopic perturbations to key memory registers such that the robot holds the goal of moving the vase to a different location and the whole vase/robot system now exhibits a tendency to evolve towards a different target configuration. In contrast, a system containing a ball rolling towards the bottom of a valley cannot generally be modified by any microscopic perturbation such that the ball will roll to a different target location. A tree is an intermediate example: to cause the tree to evolve towards a different target configuration set — say, one in which its leaves were of a different shape — one would have to modify the genetic code simultaneously in all of the tree’s cells.
Relationship to Yudkowsky’s definition of optimization
In Measuring Optimization Power, Eliezer Yudkowsky defines optimization as a process in which some part of the world ends up in a configuration that is high in an agent’s preference ordering, yet has low probability of arising spontaneously. Yudkowsky’s definition asks us to look at a patch of the world that has already undergone optimization by an agent or mind, and draw conclusions about the power or intelligence of that mind by asking how unlikely it would be for a configuration of equal or greater utility (to the agent) to arise spontaneously.
Our definition differs from this in the following ways:
We look at whole systems that evolve naturally under physical laws. We do not assume that we can decompose these systems into some engine and object of optimization, or into mind and environment. We do not look at systems that are "being optimized" by some external entity but rather at "optimizing systems" that exhibit a natural tendency to evolve towards a target configuration set. These optimizing systems may contain subsystems that have the properties of agents, but as we will see there are many instances of optimizing systems that do not contain dualistic agentic subsystems.
When discerning the boundary between optimization and non-optimization, we look principally at robustness — whether the system will continue to evolve towards its target configuration set in the face of perturbations — whereas Yudkowsky looks at the improbability of the final configuration.
Relationship to Drexler’s Comprehensive AI Services
Eric Drexler has written about the need to consider AI systems that are not goal-directed agents. He points out that the most economically important AI systems today are not constructed within the agent paradigm, and that in fact agents represent just a tiny fraction of the design space of intelligent systems. For example, a system that identifies faces in images would be an intelligent system but not an agent according to Drexler’s taxonomy. This perspective is highly relevant to our discussion here since we seek to go beyond the narrow agent model in which intelligent systems are conceived of as unitary entities that receive observations from the environment, send actions back into the environment, but are otherwise separate from the environment.
Our perspective is that there is a specific class of intelligent systems — which we call optimizing systems — that are worthy of special attention and study due to their potential to reshape the world. The set of optimizing systems is smaller than the set of all AI services, but larger than the set of goal-directed agentic systems.
Figure: relationship between our optimizing system concept and Drexler’s taxonomy of AI systems
Examples of systems that lie in each of these three tiers are as follows:
A system that identifies faces in images by evaluating a feed-forward neural network is an AI system but not an optimizing system.
A tree is an optimizing system but not a goal-directed agent system (see section below analyzing a tree as an optimizing system).
A robot with the goal of moving a ball to a specific destination is a goal-directed agent system.
Relationship to Garrabrant and Demski’s Embedded Agency
Scott Garrabrant and Abram Demski have written about the many ways that a dualistic view of agency in which one conceives of a hard separation between agent and environment fails to capture the reality of agents that are reducible to the same basic building-blocks as the environments in which they are embedded. They show that if one starts from a dualistic view of agency then it is difficult to design agents capable of reflecting on and making improvements to their own cognitive processes, since the dualistic view of agency rests on a unitary agent whose cognition does not affect the world except via explicit actions. They also show that reasoning about counterfactuals becomes nonsensical if starting from a dualistic view of agency, since the agent’s cognitive processes are governed by the same physical laws as those that govern the environment, and the agent can come to notice this fact, leading to confusion when considering the consequences actions that are different from the actions that the agent will, in fact, output.
One could view the Embedded Agency work as enumerating the many logical pitfalls one falls into if one takes the "optimizer" concept as the starting point for designing intelligent systems, rather than "optimizing system" as we propose here. The present work is strongly inspired by Garrabrant and Demski’s work. Our hope is to point the way to a view of optimization and agency that captures reality sufficiently well to avoid the logical pitfalls identified in the Embedded Agency work.
Example: ball in a valley
Consider a physical ball rolling around in a small valley. According to our definition of optimization, this is an optimizing system:
Configuration space. The system we are studying consists of the physical valley plus the ball
Basin of attraction. The ball could initially be placed anywhere in the valley (these are the configurations comprising the basin of attraction)
Target configuration set. The ball will roll until it ends up at the bottom of the valley (the set of local minima are the target configurations)
We can perturb the ball while it is "in flight", say by changing its position or velocity, and the ball will still ultimately end up at one of the target configurations. This system is robust to perturbations along dimensions corresponding to the spatial position and velocity of the ball, but there are many more dimensions along which this system is not robust. If we change the shape of the ball to a cube, for example, then the ball will not continue rolling to the bottom of the valley.
Example: ball in valley with robot
Consider now a ball in a valley as above, but this time with the addition of an intelligent robot holding the goal of ensuring that the ball reaches the bottom of the valley.
Configuration space. The system we are studying now consists of the physical valley, the ball, and the robot. We consider the evolution of and perturbations to this whole joint system.
Target configuration set. As before, the target configuration is the ball being at the bottom of the valley
Basin of attraction. As before, the basin of attraction consists of all the possible spatial locations that the ball could be placed in the valley.
We can now perturb the system along many more dimensions than in the case where there was no robot. For example, we could introduce a barrier that prevents the ball from rolling downhill past a certain point, and we can then expect a sufficiently intelligent robot to move the ball over the barrier. We can expect a sufficiently well-designed robot to be able to overcome a wide variety of hurdles that gravity would not overcome on its own. Therefore we say that this system is more robust than the system without the robot.
There is a sequence of systems spanning the gap between a ball rolling in a valley, which is robust to a narrow set of perturbations and therefore we say exhibits a weak degree of optimization, up to a robot with a goal of moving a ball around in a valley, which is robust to a much wider set of perturbations, and therefore we say exhibits a stronger degree of optimization. Therefore the difference between systems that do and do not undergo optimization is not a binary distinction but a continuous gradient of increasing robustness to perturbations.
By introducing the robot to the system we have also introduced new dimensions along which the system is fragile: the dimensions corresponding to modifications to the robot itself, and in particular the dimensions corresponding to modifications to the code running on the robot (i.e. physical perturbations to the configuration of the memory cells in which the code is stored). There are two types of perturbation we might consider:
Perturbations that destroy the robot. There are numerous ways we could cut wires or scramble computer code that would leave the robot completely non-operational. Many of these would be physically microscopic, such as flipping a single bit in a memory cell containing some critical computer code. In fact there are now more ways to break the system via microscopic perturbations compared to when we were considering a ball in a valley without a robot, since there are few ways to cause a ball not to reach the bottom of a valley by making only a microscopic perturbation to the system, but there are many ways to break modern computer systems via a microscopic perturbation.
Perturbations that change the target configurations. We could also make physically microscopic perturbations to this system that change the robot’s goal. For example we might flip the sign on some critical computations in the robot’s code such that the robot works to place the ball at the highest point rather than the lowest. This is still a physical perturbation to the valley/ball/robot system: it is one that affects the configuration of the memory cells containing the robot’s computer code. These kinds of perturbations may point to a concept with some similarity to that of an agent. If we have a system that can be perturbed in a way that preserves the robustness of the basin of convergence but changes the target configuration towards which the system tends to evolve, and if we can find perturbations that cause the target configurations to match our own goals, then we have a way to navigate between convergence basins.
Example: computer performing gradient descent
Consider now a computer running an iterative gradient descent algorithm in order to solve an optimization problem. For concreteness let us imagine that the objective function being optimized is globally convex, in which case the algorithm will certainly reach the global optimum given sufficient time. Let us further imagine that the computer stores its current best estimate of the location of the global optimum (which we will henceforth call the "optimizand") at some known memory location, and updates this after every iteration of gradient descent.
Since this is a purely computational process, it may be tempting to define the configuration space at the computational level — for example by taking the configuration space to be the domain of the objective function. However, it is of utmost importance when analyzing any optimizing system to ground our analysis in a physical system evolving according to the physical laws of nature, just as we have for all previous examples. The reason this is important is to ensure that we always study complete systems, not just some inert part of the system that is "being optimized" by something external to the system. Therefore we analyze this system as follows.
Configuration space. The system consists of a physical computer running some code that performs gradient descent. The configurations of the system are the physical configurations of the atoms comprising the computer.
Target-configuration set. The target configuration set consists of the set of physical configurations of the computer in which the memory cells that store the current optimized state contain the true location of the global optimum (or the closest floating point representation of it).
Basin of attraction. The basin of attraction consists of the set of physical configurations in which there is a viable computer and it is running the gradient descent algorithm.
Example: billiard balls
Let us now examine a system that is not an optimizing system according to our definition. Consider a billiard table with some billiard balls that are currently bouncing around in motion. Left alone, the balls will eventually come to rest in some configuration. Is this an optimizing system?
In order to qualify as an optimizing system, a system must (1) have a tendency to evolve towards a set of target configurations that are small relative to the basin of attraction, and (2) continue to evolve towards the same set of target configurations if perturbed.
If we reach in while the billiard balls are bouncing around and move one of the balls that is in motion, the system will now come to rest in a different configuration. Therefore this is not an optimizing system, because there is no set of target configurations towards which the system evolves despite perturbations. A system does not need to be robust along all dimensions in order to be an optimizing system, but a billiard table exhibits no such robust dimensions at all, so it is not an optimizing system.
Example: satellite in orbit
Consider a second example of a system that is not an optimizing system: a satellite in orbit around Earth. Unlike the billiard balls, there is no chaotic tendency for small perturbations to lead to large deviations in the system’s evolution, but neither is there any tendency for the system to come back to some target configuration when perturbed. If we perturb the satellite’s velocity or position, then from that point on it is in a different orbit and has no tendency to return to its previous orbit. There is no set of target configurations towards which the system evolves despite perturbations, so this is not an optimizing system.
Example: a tree
Consider a patch of fertile ground with a tree growing in it. Is this an optimizing system?
Configuration space. For the sake of concreteness let us take a region of space that is sealed off from the outside world — say 100m x 100m x 100m. This region is filled at the bottom with fertile soil and at the top with an atmosphere conducive to the tree’s growth. Let us say that the region contains a single tree.
We will analyze this system in terms of the arrangement of atoms inside this region of space. Out of all the possible configurations of these atoms, the vast majority consist of a uniform hazy gas. An astronomically tiny fraction of configurations contain a non-trivial mass of complex biological nutrients making up soil. An even tinier fraction of configurations contain a viable tree.
Target-configuration set. A tree has a tendency to grow taller over time, to sprout more branches and leaves, and so on. Furthermore, trees can only grow so tall due to the physics of transporting sugars up and down the trunk. So we can identify a set of target configurations in which the atoms in our region of space are arranged into a tree that has grown to its maximum size (has sprouted as many branches and leaves as it can support given the atmosphere, the soil that it is growing in, and the constraints of its own biology). There are many topologies in which the tree’s branches could divide, many positions that leaves could sprout in, and so on, so there are many configurations within the target configuration set. But this set is still tiny compared to all the ways that the same atoms could be arranged without the constraint of forming a viable tree.
Basin of convergence. This system will evolve towards the target configuration set starting from any configuration in which there is a viable tree. This includes configurations in which there is just a seed in the ground, as well as configurations in which there is a tree of small, medium, or large size. Starting from any of these configurations, if we leave the system to evolve under the natural laws of physics then the tree will grow towards its maximum size, at which point the system will be in one of the target configurations.
Robustness to perturbations. This system is highly robust to perturbations. Consider perturbing the system in any of the following ways:
Moving soil from one place to another
Removing some leaves from the tree
Cutting a branch off the tree
These perturbations might change which particular target configuration is eventually reached — the particular arrangement of branches and leaves in the tree once it reaches its maximum size — but they will not stop the tree from growing taller and evolving towards a target configuration. In fact we could cut the tree right at the base of the trunk and it would continue to evolve towards a target configuration by sprouting a new trunk and growing a whole new tree.
Duality. A tree is a non-dualistic optimizing system. There is no subsystem that is responsible for "doing" the optimization, separately from that which is "being" optimized. Yet the tree does exhibit a tendency to evolve towards a set of target configurations, and can overcome a wide variety of perturbations in order to do so. There are no man-made systems in existence today that are capable of gathering and utilizing resources so flexibly as a tree, from so broad a variety of environments, and there are certainly no man-made systems that can recover from being physically dismembered to such an extent that a tree can recover from being cut at the trunk.
At this point it may be tempting to say that the engine of optimization is natural selection. But recall that we are studying just a single tree growing from seed to maximum size. Can you identify a physical subset of our 100m x 100m x 100m region of space that is this engine of optimization, analogous to how we identified a physical subset of the robot-and-ball system as the engine of optimization (i.e. the physical robot)? Natural selection might be the process by which the initial system came into existence, but it is not the process that drives the growth of the tree towards a target configuration.
It may then be tempting to say that it is the tree’s DNA that is the engine of optimization. It is true that the tree’s DNA exhibits some characteristics of an engine of optimization: it remains unchanged throughout the life of the tree, and physically microscopic perturbations to it can disable the tree. But a tree replicates its DNA in each of its cells, and perturbing just one or a small number of these is not likely to affect the tree’s overall growth trajectory. More importantly, a single strand of DNA does not really have agency on its own: it requires the molecular machinery of the whole cell to synthesize proteins based on the genetic code in the DNA, and the physical machinery of the whole tree to collect and deploy energy, water, and nutrients. Just as it would be incorrect to identify the memory registers containing computer code within a robot as the "true" engine of optimization separate from the rest of the computing and physical machinery that brings this code to life, it is not quite accurate to identify DNA as an engine of optimization. A tree simply does not decompose into engine and object of optimization.
It may also be tempting to ask whether the tree can "really" be said to be undergoing optimization in the absence of any "intention" to reach one of the target configurations. But this expectation of a centralized mind with centralized intentions is really an artifact of us projecting our view of our self onto the world: we believe that we have a centralized mind with centralized intentions, so we focus our attention on optimizing systems with a similar structure. But this turns out to be misguided on two counts: first, the vast majority of optimizing systems do not contain centralized minds, and second, our own minds are actually far less centralized than we think! For now we put this question of whether optimization requires intentions and instead just work within our definition of optimizing systems, which a tree definitely satisfies.
Example: bottle cap
Daniel Filan has pointed out that some definitions of optimization would nonsensically classify a bottle cap as an optimizer, since a bottle cap causes water molecules in a bottle to stay inside the bottle, and the set of configurations in which the molecules are inside a bottle is much smaller than the set of configurations in which the molecules are each allowed to take a position either inside or outside the bottle.
In our framework we have the following:
The system consists of a bottle, a bottle cap, and water molecules. The configuration space consists of all the possible spatial arrangements of water molecules, either inside or outside the bottle.
The basin of attraction is the set of configurations in which the water molecules are inside the bottle
The target configuration set is the same as the basin of attraction
This is not an optimizing system for two reasons.
First, the target configuration set is no smaller than the basin of attraction. To be an optimizing system there must be a tendency to evolve from any configuration within a basin of attraction towards a smaller target configuration set, but in this case the system merely remains within the set of configurations in which the water molecules are inside the bottle. This is no different from a rock sitting on a beach: due to basic chemistry there is a tendency to remain within the set of configurations in which the molecules comprising the rock are physically bound to one another, but it has no tendency to evolve from a wide basin of attraction towards a small set of target configuration.
Second, the bottle cap system is not robust to perturbations since if we perturb the position of a single water molecule so that it is outside the bottle, there is no tendency for it to move back inside the bottle. This is really just the first point above restated, since if there were a tendency for water molecules moved outside the bottle to evolve back towards a configuration in which all the water molecules were inside the bottle, then we would have a basin of attraction larger than the target configuration set.
Example: the human liver
Filan also asks whether one’s liver should be considered an optimizer. Suppose we observe a human working to make money. If this person were deprived of a liver, or if their liver stopped functioning, they would presumably be unable to make money. So are we then to view the liver as an optimizer working towards the goal of making money? Filan asks this question as a challenge to Yudkowsky’s definition of optimization, since it seems absurd to view one’s liver as an optimizer working towards the goal of making money, yet Yudkowsky’s definition of optimization might classify it as such.
In our framework we have the following:
The system consists of a human working to make money, together with the whole human economy and world.
The basin of attraction consists of the configurations in which there is a healthy human (with a healthy liver) having the goal of making money
The target configurations are those in which this person’s bank balance is high. (Interestingly there is no upper bound here, so there is no fixed point but rather a continuous gradient.)
We can expect that this person is capable of overcoming a reasonably broad variety of obstacles in pursuit of making money, so we recognize that this overall system (the human together with the whole economy) is an optimizing system. But Filan would surely agree on this point and his question is more specific: he is asking whether the liver is an optimizer.
In general we cannot expect to decompose optimizing systems into an engine of optimization and object of optimization. We can see that the system has the characteristics of an optimizing system, and we may identify parts, including in this case the person’s liver, that are necessary for these characteristics to exist, but we cannot in general identify any crisp subset of the system as that which is doing the optimization. And picking various subcomponents of the system (such as the person’s liver) and asking "is this the part that is doing the optimization?" does not in general have an answer.
By analogy, suppose we looked at a planet orbiting a star and asked: "which part here is doing the orbiting?" Is it the planet or the star that is the "engine of orbiting"? Or suppose we looked at a car and noticed that the fuel pump is a complex piece of machinery without which the car’s locomotion would cease. We might ask: is this fuel pump the true "engine of locomotion"? These questions don’t have answers because they mistakenly presuppose that we can identify a subsystem that is uniquely responsible for the orbiting of the planet or the locomotion of the car. Asking whether a human liver is an "optimizer" is similarly mistaken: we can see that the liver is a complex piece of machinery that is necessary in order for the overall system to exhibit the characteristics of an optimizing system (robust evolution towards a target configuration set), but beyond this it makes no more sense to ask whether the liver is a true "locus of optimization".
So rather than answering Filan’s question in either the positive or the negative, the appropriate move is to dissolve the concept of an optimizer, and instead ask whether the overall system is an optimizing system.
Example: the universe as a whole
Consider the whole physical universe as a single closed system. Is this an optimizing system?
The second law of thermodynamics tells us that the universe is evolving towards a maximally disordered thermodynamic equilibrium in which it cycles through various maxentropy configuration. We might then imagine that the universe is an optimizing system in which the basin of attraction is all possible configurations of matter and energy, and the target configuration set consists of the maxentropy configurations.
However, this is not quite accurate. Out of all possible configurations of the universe, the vast majority of configurations are at or close to maximum entropy. That is, if we sample a configuration of the universe at random, we have only an astronomically tiny chance of finding anything other than a close-to-uniform gas of basic particles. If we define the basin of attraction as all possible configurations of matter in the universe and the target configuration set as the set of maxentropy configurations, then the target configuration set actually contains almost the entirety of the basin of attraction, with the only configurations that are in the basin of attraction but not the target configuration set being the highly unusual configurations of matter containing stars, galaxies, and so on.
For this reason the universe as a whole does not qualify as an optimizing system under our definition. (Or perhaps it would be more accurate to say that it qualifies as an extremely weak optimizing system.)
Power sources and entropy
The second law of thermodynamics tells us that any closed system will eventually tend towards a maximally disordered state in which matter and energy is spread approximately uniformly through space. So if we were to isolate one of the systems explore above inside a sealed chamber and leave it for a very long period then eventually whatever power source we put inside the sealed chamber would become depleted, and then eventually after that every complex material or compound in the system would degrade into its base products, and then finally we would be left with a chamber filled with a uniform gaseous mixture of whatever base elements we originally put in.
So in this sense there are no optimizing systems at all, since any of the systems above evolve towards their target configuration sets only for a finite period of time, after which they degrade and evolve towards a maxentropy configuration.
This is not a very serious challenge to our definition of optimization since it is common throughout physics and computer science to study various "steady-state" or "fixed point" systems even though the same objection could be made about any of them. We say that a thermometer can be used to build a heat regulator that will keep the temperature of a house within a desired range, and we do not usually need to add the caveat that eventually the house and regulator will degrade into a uniform gaseous mixture due to the heat death of the universe.
Nevertheless, two possible ways to refine our definition are:
We could stipulate that some power source is provided externally to each system we analyze, and then perform our analysis conditional on the existence of that power source.
We could specify a finite time horizon and say that "a system is an optimizing system if it tends towards a target configuration set up to time T".
Connection to dynamical systems theory
The concept of "optimizing system" in this essay is very close to that of a dynamical system with one or more attractors. We offer the following remarks on this connection.
A general dynamical system is any system with a state that evolves over time as a function of the state itself. This encompasses a very broad range of systems indeed!
In dynamical system theory, an attractor is the term used for what we have called the target configuration set. A fixed point attractor is, in our language, a target configuration set with just one element, such as when computing the square root of two. A limit cycle is, in our language, a system that eventually stably loops through a sequence of states all of which are in the target configuration set, such as a satellite in orbit.
We have discussed systems that evolve towards target configurations along some dimensions but not others (e.g. ball in a valley). We have not yet discovered whether dynamical systems theory explicitly studies attractors that operate along a subset of the system’s dimensions.
There is a concept of "well-posedness" in dynamical systems theory that justifies the identification of a mathematical model with a physical system. The conditions for a model to be well-posed are (1) that a solution exists (i.e. the model is not self-contradictory), (2) that there is a unique solution (i.e. the model contains enough information to pick out a single system trajectory), and (3) that the solution changes continuously with the initial conditions (the behavior of the system is not too chaotic). This third condition may present an interesting avenue for future investigation as it seems related to but not quite equivalent to our notion of robustness since robustness as we define it additionally requires that the system continue to evolve towards the same attractor state despite perturbations. Exploring this connection may present an interesting avenue for future investigation.
Conclusion
We have proposed a concept that we call "optimizing systems" to describe systems that have a tendency to evolve towards a narrow target configuration set when started from any point within a broader basin of attraction, and continue to do so despite perturbations.
We have analyzed optimizing systems along three dimensions:
Robustness, which measures the number of dimensions along which the system is robust to perturbations, and the magnitude of perturbation along these dimensions that the system can withstand.
Duality, which measures the extent to which an approximate "engine of optimization" subsystem can be identified.
Retargetability, which measures the extent to which the system can be transformed via microscopic perturbations into an equally robust optimizing system but with a different target configuration set.
We have argued that the "optimizer" concept rests on an assumption that optimizing systems can be decomposed into engine and object of optimization (or agent and environment, or mind and world). We have described systems that do exhibit optimization yet cannot be decomposed this way, such as the tree example. We have also pointed out that, even among those systems that can be decomposed approximately into engine and object of optimization (for example, a robot moving a ball around), we will not in general be able to meaningfully answer the question of whether arbitrary subcomponents of the agent are an optimizer not (c.f. the human liver example).
Therefore, while the "optimizer" concept clearly still has much utility in designing intelligent systems, we should be cautious about taking it as a primitive in our understanding of the world. In particular we should not expect questions of the form "is X an optimizer?" to always have answers.