A Straightforward Explanation of the Good Regulator Theorem

Alfred Harwood

This post was written during the agent foundations fellowship with Alex Altair funded by the LTFF. Thanks to Alex, Jose, Daniel, Cole, and Einar for reading and commenting on a draft.

The Good Regulator Theorem, as published by Conant and Ashby in their 1970 paper (cited over 1700 times!) claims to show that 'every good regulator of a system must be a model of that system', though it is a subject of debate as to whether this is actually what the paper shows. It is a fairly simple mathematical result which is worth knowing about for people who care about agent foundations and selection theorems. You might have heard about the Good Regulator Theorem in the context of John Wentworth's 'Gooder Regulator' theorem and his other improvements on the result.

Unfortunately, the original 1970 paper is notoriously unfriendly to readers. It makes misleading claims, doesn't clearly state what exactly it shows and uses strange non-standard notation and cybernetics jargon ('coenetic variables' anyone?). If you want to understand the theorem without reading the paper, there are a few options. John Wentworth's post has a nice high-level summary but refers to the original paper for the proof. John Baez's blogpost is quite good but is very much written in the spirit of trying to work out what the paper is saying, rather than explaining it intuitively. I couldn't find an explanation in any control theory textbooks (admittedly my search was not exhaustive). A five year-old stackexchange question, asking for a rigorous proof, goes unanswered. The best explainer I could find was Daniel L. Scholten's 'A Primer for Conant and Ashby's Good-Regulator Theorem' from the mysterious, now-defunct 'GoodRegulatorProject.org' (link to archived website). This primer is nice, but really verbose (44 pages!). It is also aimed at approximately high-school (?) level, spending the first 15 pages explaining the concept of 'mappings' and conditional probability.

Partly to test my understanding of the theorem and partly to attempt to fill this gap in the market for a medium-length, entry-level explainer of the original Good Regulator Theorem, I decided to write this post.

Despite all the criticism, the actual result is pretty neat and the math is not complicated. If you have a very basic familiarity with Shannon entropy and conditional probability, you should be able to understand the Good Regulator Theorem.

This post will just discuss the original Good Regulator Theorem, not any of John Wentworth's additions. I'll also leave aside discussion of how to interpret the theorem (questions such as 'what counts as a model?' etc.) and just focus on what is (as far as I can tell) the main mathematical result in the paper.

Let's begin!

The Setup

Conant and Ashby's paper studies a setup which can be visualised using the following causal Bayes net:

If you are not familiar with Bayes nets you can just think of the arrows as meaning 'affects'. So means 'variable $A$ affects the outcome of variable $B$ '. This way of thinking isn't perfect or rigorous, but it does the job.

Just to be confusing , the paper discusses a couple of different setups and draws a few different diagrams, but you can ignore them. This is the setup they study and prove things about. This is the only setup we will use in this post.

The broad idea of a setup like this is that the outcome $Z$ is affected by a system variable $S$ and a regulator variable $R$ . The system variable is random. The regulator variable might be random and independent of $S$ but most of the time we are interested in cases where it depends on the value of $S$ . By changing the way that $R$ depends on $S$ , the distribution over outcomes $Z$ can be changed. As control theorists who wish to impose our will on the uncooperative universe, we are interested in the problem of 'how do we design a regulator which can steer $Z$ towards an outcome we desire, in spite of the randomness introduced by $S$ ?'

The archetypal example for this is something like a thermostat. The variable $S$ represents random external temperature fluctuations. The regulator $R$ is the thermostat, which measures these fluctuations and takes an action (such as putting on heating or air conditioning) based on the information it takes in. The outcome $Z$ is the resulting temperature of the room, which depends both on the action taken by the regulator, and the external temperature.

Each node in the Bayes net is a random variable. The 'system' is represented by a random variable $S$ , which can take values from the set ${s_{1}, s_{2}, . . . s_{d_{S}}}$ . It takes these values with probabilities $P (s_{1}), P (s_{2})$ etc. Think of the system as an 'environment' which contains randomness.

The variable $R$ represents a 'regulator'- a random variable which can take values from the set ${r_{1}, r_{2}, . . . r_{d_{R}}}$ . As the diagram above shows, the regulator can be affected by the system state and is therefore described by a conditional probability distribution $P (R | S)$ . Conditional probabilities tell you what the probability of $R$ is, given that $S$ has taken a particular value. For example, the equation $P (R = r_{2} | S = s_{5}) = 0.9$ tells us that if $S$ takes the value $s_{5}$ , then the probability that $R$ takes the value $r_{2}$ is $0.9$ . When we discuss making a good regulator, we are primarily concerned with choosing the right conditional probability distribution $P (R | S)$ which helps us achieve our goals (more on exactly what constitutes 'goals' in the next section). One important assumption made in the paper is that the regulator has perfect information about the system, so $R$ can 'see' exactly what value $S$ takes. This is one of the assumptions which is relaxed by John Wentworth, but since we are discussing the original proof, we will keep this assumption for now.

Finally, the variable $Z$ represents the 'outcome' - a random variable which can take values from the set ${z_{1}, z_{2}, . . . z_{d_{Z}}}$ . The variable $Z$ is entirely determined by the values of $R$ and $S$ so we can write it as a deterministic function of the regulator state and the system state. Following Conant and Ashby, we use $ψ$ to represent this function, allowing us to write $Z = ψ (R, S)$ . Note that it is possible to imagine cases where $Z$ is related to $R$ and $S$ in a non-deterministic way but Conant and Ashby do not consider cases like this so we will ignore them here (this is another one of the extensions proved by John Wentworth - I hope to write about these at a later date!).

What makes a regulator 'good'?

Conant and Ashby are interested in the question: 'what properties should $R$ have in order for a regulator to be good?' In particular, we are interested in what properties the conditional probability distribution $P (R | S)$ should have, so that $R$ is effective at steering $Z$ towards states that we want.

One way that a regulator can be good is if the Shannon entropy of the random variable $Z$ is low. The Shannon entropy is given by

H (Z) = \sum i P (Z = z_{i}) log \frac{1}{P (Z = z_{i})} .

The Shannon entropy tells us how 'spread out' the distribution on $Z$ is. A good regulator will make $H (Z)$ as small as possible, steering $Z$ towards a low-uncertainty probability distribution. Often, in practice, a producing a low entropy outcome is not on its own sufficient for a regulator to be useful. Scholten gives the evocative example of a thermostat which steers the temperature of a room to 350°F with a probability close to certainty. The entropy of the final distribution over room temperatures would be very low, so in this sense the regulator is still 'good', even though the temperature it achieves is too high for it to be useful as a domestic thermostat. Going forward, we will use low outcome entropy as a criterion for a good regulator, but its better to think of this as a necessary and/or desirable condition rather than sufficient condition for a good regulator.

The second criterion for a good regulator, according to Conant and Ashby, is that the regulator is not 'unnecessarily complex'. What they mean by this is that if two regulators achieve the same output entropy, but one of the regulators uses a policy involving some randomness and the other policy is deterministic, the policy that uses randomness is unnecessarily complex, so is less 'good' than the deterministic policy.

For example, imagine we have a setup where $ψ (r_{1}, s_{2}) = ψ (r_{2}, s_{2}) = z_{1}$ . Then, when the regulator is presented with system state $s_{2}$ , it could choose from between the following policies:

Pick $r_{1}$ with probability 1 whenever $S = s_{2}$ . So $P (R = r_{1} | S = s_{2}) = 1$ and $P (R = r_{2} | S = s_{2}) = 0$
Pick $r_{2}$ with probability 1 whenever $S = s_{2}$ . So $P (R = r_{1} | S = s_{2}) = 0$ and $P (R = r_{2} | S = s_{2}) = 1$
Toss a coin an pick $r_{1}$ if it lands heads and pick $r_{2}$ if it lands tails. So $P (R = r_{1} | S = s_{2}) = \frac{1}{2}$ and $P (R = r_{2} | S = s_{2}) = \frac{1}{2}$ .

All three of these policies achieve the same result (the outcome will always be $z_{1}$ whenever $S = s_{2}$ ), and the same output entropy, but the third option is 'unnecessarily complex', so is not a good regulator. Argue amongst yourselves about whether you find this criterion convincing. Nonetheless, it is the criterion Conant and Ashby use, so we will use it as well.

To recap: a good regulator is one which satisfies the following criteria:

It minimizes the entropy of the outcome variable $Z$ .
It is not unnecessarily complex, in the sense described above.

The Theorem Statement

The theorem statement can be written as follows:

If a regulator is 'good' (in the sense described by the two criteria in the previous section), then the variable R can be described as a deterministic function of S .

Another way of saying that ' $R$ can be described as a deterministic function of $S$ ' is to say that for every $r_{i}$ and $s_{j}$ , then $P (R = r_{i} | S = s_{j})$ either equals 0 or 1. This means that $R$ can be written as $R = f (S)$ for some mapping $f$ .

We are now almost ready the prove the theorem. But first, it is worth introducing a basic concept about entropy, from which the rest of the Good Regulator Theorem flows straightforwardly.

Concavity of Entropy

Conant and Ashby write:

One of the useful and fundamental properties of the entropy function is that any such increase in imbalance in $p (Z)$ necessarily decreases $H (Z)$ .

This is probably pretty intuitive if you are familiar with Shannon Entropy. Here is what it means. Suppose we have a probability distribution $P (Z)$ which assigns probabilities $P (Z = z_{a})$ and $P (Z = z_{b})$ for two different outcomes with $P (Z = z_{a}) \geq P (Z = z_{b})$ (and other probabilities to other $z$ -values). Now suppose we increase the probability of outcome $z_{a}$ (which was already as likely or more likely than $z_{b}$ ) and decrease $P (Z = z_{b})$ the same amount while keeping the rest of the distribution the same. The resulting distribution will end up with a lower entropy than the original distribution. If you are happy with this claim you can skip the rest of this section and move on to the next section. If you are unsure, this section will provide a little more clarification of this idea.

One way to prove this property is to explicitly calculate the entropy of a general distribution where one of the probabilities is $p_{a} + δ$ and another is $p_{b} - δ$ (where $p_{a} \geq p_{b}$ and $δ > 0$ ) . Then, you can differentiate the expression for entropy with respect to $δ$ and show that $\frac{d H}{d δ} < 0$ ie. $H$ is a decreasing function of $δ$ . This is fine and do-able if you don't mind doing a little calculus. Scholten has a nice walk-through of this approach in the section of his primer titled 'A Useful and Fundamental Property of the Entropy Function'.

Here is another way to think about it. Consider a random variable $Z^{'}$ with only two outcomes $z_{a}$ and $z_{b}$ . Outcome $z_{a}$ occurs with probability $q$ and $z_{b}$ occurs with probability $1 - q .$ The entropy of this variable is

H (Z^{'}) = q log \frac{1}{q} + (1 - q) log \frac{1}{1 - q} .

This is a concave function of $q$ . When plotted (using base 2 logarithms) it looks like this:

If $q \leq 0.5$ , decreasing $q$ decreases the entropy and if $q \geq 0.5$ , increasing $q$ decreases the entropy. So 'increasing the imbalance' of a 2-outcome probability distribution will always decrease entropy. Is this still true if we increase the imbalance between two outcomes within a larger probabilities distribution with more outcomes? The answer is yes.

Suppose our outcomes $z_{a}$ and $z_{b}$ are situated within a larger probability distribution. We can view this larger probability distribution as a mixture of our 2-outcome variable $Z^{'}$ and another variable which we can call $Y$ which captures all other outcomes. We can write $Z$ (which we can think of as the 'total' random variable) as

Z = λ Z^{'} + (1 - λ) Y .

With probability $λ = P (Z = z_{a}) + P (Z = z_{b})$ , the variable $Z$ takes a value determined by $Z^{'}$ and with probability $1 - λ$ , the value of $Z$ is determined by random variable $Y$ .

It turns out the entropy of such variable, generated by mixing non-overlapping random variables, can be expressed as follows:

H (Z) = λ H (Z^{'}) + (1 - λ) H (Y) + g (λ)

where $g (λ) = λ log \frac{1}{λ} + (1 - λ) log \frac{1}{1 - λ}$ is the binary entropy (see eg. this Stackexchange answer for a derivation). Increasing the relative 'imbalance' of $P (Z = z_{a})$ and $P (Z = z_{b}$ ) while keeping their sum constant does not change $λ$ or $H (Y),$ but does reduce $H (Z^{'})$ , thus reducing the total entropy $H (Z)$ .

This is a fairly basic property of entropy but understanding it is one of the only conceptual pre-requisites for understanding the good regulator theorem. Hopefully it is clear now if it wasn't before.

On to the main event!

The Main Lemma

Conant and Ashby's proof consists of one lemma and one theorem. In this section we will discuss the lemma. I'm going to state the lemma in a way that makes sense to me and is (I'm pretty sure) equivalent to the lemma in the paper.

Lemma:

Suppose a regulator is 'good' in the sense that it leads to Z having the lowest possible entropy. Then P(R|S) must have been chosen so that Z is a deterministic function of S ie. H(Z|S)=0.

Here is an alternative phrasing, closer to what Conant and Ashby write:

Suppose a regulator is 'good' in the sense that it leads to $Z$ having the lowest possible entropy. Suppose also that, for a system state $s_{j}$ , this regulator has a non-zero probability of producing states $r_{i}$ and $r_{k}$ , ie. $P (R = r_{i} | S = s_{j}) > 0$ and $P (R = r_{k} | S = s_{j}) > 0$ . Then, it must be the case that $ψ (r_{i}, s_{j}) = ψ (r_{k}, s_{j})$ , otherwise, the regulator would not be producing the lowest possible output entropy.

Here is another alternative phrasing:

Suppose a regulator is 'good' in the sense that it leads to $Z$ having the lowest possible entropy. If, for a given system state, multiple regulator states have non-zero probability, then all of these regulator states lead to the same output state when combined with that system state through $ψ$ . If this was not the case, we could find another regulator which lead to $Z$ having a lower entropy.

This is one of those claims which is kind of awkward to state in words but is pretty intuitive once you understand what it's getting at.

Imagine there is a regulator which, when presented with a system state $s_{j}$ , produces state $r_{i}$ with probability $P (R = r_{i} | S = s_{j}) \neq 0$ and produces state $r_{k}$ with probability $P (R = r_{k} | S = s_{j}) \neq 0$ . Furthermore, suppose that $ψ$ is such that $ψ (r_{i}, s_{j}) = z_{a}$ and $ψ (r_{k}, s_{j}) = z_{b}$ . This means that, when presented with system state $s_{j}$ , the regulator sometimes acts such that it produces an outcome state $z_{a}$ and other times acts so as to produce an outcome state $z_{b}$ . This means $Z$ is not a deterministic function of $S$ . Is it possible that this regulator produces the lowest possible output entropy? From considering the previous section, you might already be able to see that the answer is no, but I'll spell it out a bit more.

The total probability that $Z = z_{a}$ will be given by the sum of the probability that $Z = z_{a}$ , when $S$ is not $s_{j}$ and the probability that $R$ is $r_{i}$ when $S$ equals $s_{j}$ :

P (z_{a}) = P (Z = z_{a} | S \neq s_{j}) P (S \neq s_{j}) + P (R = r_{i} | S = s_{j}) P (S = s_{j})

Similarly the probably that $Z = z_{b}$ is given by:

P (z_{b}) = P (Z = z_{b} | S \neq s_{j}) P (S \neq s_{j}) + P (R = r_{k} | S = s_{j}) P (S = s_{j}) .

Suppose $P (z_{a}) \geq P (z_{b})$ , then, as we saw in the previous section, we can reduce the entropy of $Z$ by increasing $P (z_{a})$ and decreasing $P (z_{b})$ by the same amount. This can be achieved by changing the regulator so that $P (R = r_{i} | S = s_{j})$ is increased and $P (R = r_{k} | S = s_{j})$ is decreased by the same amount. Therefore, a regulator which with nonzero probability produces two different $R$ values when presented with the same $S$ -value cannot be optimal if those two $R$ -values lead to different $Z$ -values. We can always find a regulator which consistently picks $r_{i}$ 100% of the time which leads to a lower output entropy. (A symmetric argument can be made if we instead assume $P (z_{b}) \geq P (z_{a})$ .)

However, if $ψ$ was such that $ψ (r_{i}, s_{j}) = ψ (r_{k}, s_{j}) = z_{a}$ , then it would not matter whether the regulator picked $r_{i}$ or $r_{k}$ or tossed a coin to decide between them when presented with $s_{j}$ , because both choices would lead to the same $Z$ -value. In such a case, even though $R$ contains randomness, the overall effect would be that $Z$ is still a deterministic function of $S$ .

The Theorem

90% of the meat of the theorem is contained in the above lemma, we just need to tie up a couple of loose ends. To recap: we have showed that a regulator which achieves the lowest possible output entropy must use a conditional distribution $P (R | S)$ which leads to $Z$ being a deterministic function of $S$ . For each system state $s_{j}$ , the regulator must only choose $R$ -values which lead to a single $Z$ -value. This still leaves open the possibility that the regulator can pick a random $R$ -value from some set of candidates, provided that all of those candidates result in the same $Z$ -value. In our example from the previous section, this would mean that the regulator could toss a coin to choose between $r_{i}$ and $r_{k}$ when presented with system state $s_{j}$ and this regulator could still achieve the minimum possible entropy.

This is where the 'unnecessary complexity' requirement comes in. Conant and Ashby argue that one of the requirements for a 'good' regulator is that it does not contain any unnecessary complexity. A regulator which randomises its $R$ value would be considered unnecessarily complex compared to a regulator which produced the same output state distribution without using randomness. Therefore, for a regulator to be 'good' in the Conant and Ashby sense, it can only pick a single $R$ -value with 100% probability when presented with each $S$ -value. And the main lemma tells us that this condition does not prevent us from minimizing the output entropy.

This means that in the conditional probability distribution $P (R | S)$ , for each $S$ -value, the probability of any one $R$ -value is either zero or one. To put it another way, $R$ can be described as a deterministic function of $S$ . In a good regulator, knowing $S$ allows you to predict exactly what value $R$ will take. Also, since $Z$ is a deterministic function of $R$ and $S$ , this means that $Z$ , when being regulated by a good regulator, will be a deterministic function of $S$ .

Thus, we have proved that a good regulator $R$ must be a deterministic function of the system state $S$ .

Note that the argument makes no assumptions about the probability distribution over $S$ . Though changing the probability distribution over $S$ will change the final output entropy, it will not change the properties of a good regulator.

Example

Consider the following example, where $R$ , $S$ , and $Z$ have three possible states and the 'dynamics' function $ψ$ is characterised by the following table:

$ψ$	$s_{1}$	$s_{2}$	$s_{3}$
$r_{1}$	$z_{1}$	$z_{2}$	$z_{3}$
$r_{2}$	$z_{3}$	$z_{1}$	$z_{2}$
$r_{3}$	$z_{2}$	$z_{1}$	$z_{1}$

First, consider a regulator which violates the main condition of the main lemma, by randomizing between $r_{1}$ and $r_{2}$ when presented with $s_{1}$ , even though they lead to different $Z$ -values. Here is the conditional probability table for such a regulator:

$P (R \| S)$	$s_{1}$	$s_{2}$	$s_{3}$
$r_{1}$	$0.5$	$0$	$0$
$r_{2}$	$0.5$	$1$	$0$
$r_{3}$	$0$	$0$	$1$

If $S$ has a maximum entropy distribution so $P (s_{1}) = P (s_{2}) = P (s_{3}) = \frac{1}{3}$ , then this regulator will produce outcome $z_{1}$ with probability $\frac{5}{6}$ . Outcome $z_{2}$ will have probability $P (z_{2}) = 0$ and outcome $z_{3}$ will have $P (z_{3}) = \frac{1}{6}$ . This output distribution will therefore have entropy

H (Z) = \frac{5}{6} log \frac{6}{5} + \frac{1}{6} log \frac{6}{1} \approx 0.65

(using base 2 logarithms). According to the lemma, we can achieve a better (lower) output entropy by ensuring that $P (R | S)$ is such that the regulator chooses whichever $R$ -value corresponds to the $Z$ -value which already has a higher probability. In this case, $z_{1}$ has a higher probability than $z_{3}$ , so 'increasing the imbalance' means increasing the probability of $z_{1}$ , at the expense of $z_{3}$ as much as we can. This can be done by increasing $P (r_{1} | s_{1})$ to 1 and decreasing $P (r_{2} | s_{1})$ to zero (while keeping the rest of the distribution the same).

This results in a $Z$ -distribution with an entropy of zero, since, regardless of the $S$ -value, $Z$ always ends up in state $z_{1}$ . Since this entropy cannot be improved upon and the regulator does not have any unnecessary noise/complexity, the Good Regulator Theorem predicts that this regulator should be a deterministic function of $S$ . Lo and behold, it is! Each $S$ -value gets mapped to exactly one $R$ -value:

$P (R \| S)$	$s_{1}$	$s_{2}$	$s_{3}$
$r_{1}$	$1$	$0$	$0$
$r_{2}$	$0$	$1$	$0$
$r_{3}$	$0$	$0$	$1$

Consider another regulator for the same system, as characterised by the following conditional probability table:

$P (R \| S)$	$s_{1}$	$s_{2}$	$s_{3}$
$r_{1}$	$1$	$0$	$0$
$r_{2}$	$0$	$0.5$	$0$
$r_{3}$	$0$	$0.5$	$1$

Referring back to the table for $ψ$ , we can see that this regulator also achieves an output entropy of zero, even though it randomizes between $r_{2}$ and $r_{3}$ when presented with $s_{2}$ . Since $ψ (r_{2}, s_{2}) = ψ (r_{3}, s_{2}) = z_{1}$ , this isn't a problem from the point of view of minimizing entropy, but it is 'unnecessarily complex', so doesn't meet the criteria of a good regulator as Conant and Ashby define it. There are two ways to make this regulator 'good'. We could either make $P (r_{2} | s_{2}) = 1$ and $P (r_{3} | s_{2}) = 0$ , making the regulator the same as our previous example, or we could set $P (r_{2} | s_{2}) = 0$ and $P (r_{3} | s_{2}) = 1$ .

Both possibilities would be 'good regulators' in the sense that they achieve the minimum possible entropy and are not unnecessarily complex. They are also both regulators where $R$ is a deterministic function of $S$ , validating the prediction of the theorem.

Conclusion

One thing that Conant and Ashby claim about this theorem is that it shows that a good regulator must be 'modelling' the system. This is a bit misleading. As I hope I have shown, the Good Regulator Theorem shows that a good regulator (for a certain definition of 'good') must depend on the system in a particular way. But the way in which a good regulator must depend on the system does not correspond to what we might normally think of as a 'model'. The regulator must have a policy where its state deterministically depends on the system state. That's it! If we were being very generous, we might want to say something like: 'this is a necessary but not sufficient condition for a regulator that does model its environment (when the word model is use in a more normal sense)'. When Conant and Ashby say that a good regulator 'is a model of the system', they might mean that looking at $R$ tells you information about $S$ and in that sense, is a model of $S$ . When $R$ is a deterministic function of $S$ , this is sometimes true (for example when $R$ is a bijective or injective function of $S$ ). However, in some setups, the 'good' regulator $R$ might be a deterministic function of $S$ which takes the same value, regardless of the value of $S$ . I don't think its sensible to interpret such a regulator as being a model of $S$ .

Personally, I don't think that it is useful to think about the Good Regulator Theorem as a result about models. It's a pretty neat theorem about random variables and entropy (and that's ok!), but on its own, it doesn't say much about models. As with most things in this post, John Wentworth has discussed how you could modify the theorem to say something about models.

After writing this piece, the good regulator theorem is a lot clearer to me. I hope it is clearer to you as well. Notable by its absence in this post is any discussion of John Wentworth's improvements to the theorem. Time permitting, I hope to cover these at a later date.

[-]Alex_Altair6mo62

Thank you for writing this! Your description in the beginning about trying to read about the GRT and coming across a sequence of resources, each of which didn't do quite what you wanted, is a precise description of the path I also followed. I gave up at the end, wishing that someone would write an explainer, and you have written exactly the explainer that I wanted!

[-]Richard_Kennaway6mo20

350F

35°F, surely?

[-]Alfred Harwood6mo10

It is meant to read 350°F. The point is that the temperature is too high to be a useful domestic thermostat. I have changed the sentence to make this clear (and added a ° symbol ). The passage now reads:

Scholten gives the evocative example of a thermostat which steers the temperature of a room to 350°F with a probability close to certainty. The entropy of the final distribution over room temperatures would be very low, so in this sense the regulator is still 'good', even though the temperature it achieves is too high for it to be useful as a domestic thermostat.

(Edit: I've just realised that 35°F would also be inappropriate for a domestic thermostat by virtue of being too cold so either works for the purpose of the example. Scholten does use 350, so I've stuck with that. Sorry, I'm unfamiliar with Fahrenheit!)

LESSWRONG
LW

36