In the last post I introduced a potential measure for optimization, and applied it to a very simple system. In this post I will show how it applies to some more complex systems. My five takeaways so far are:
We can recover an intuitive measure of optimization
Even around a stable equilibrium, Op(A;n,m) can be negative
Our measures throw up issues in some cases
Our measures are very messy in chaotic environments
Op seems to be defined even in chaotic systems
It's good to be precise with our language, so let's be precise. Remember our model system which looks like this:
In this network, each node is represented by a real number. We'll use superscript notation to notate the value of a node: wn is the value of node n in the world W.
The heart of this is a quantity I'll call Comp, which is:
Comp(A;n,m)=limxm→wn[xm−wmym−wm]
Which is equivalent to.
Comp(A:n,m)=∂sm∂sn∣∣A varies/∂sm∂sn∣∣A constant
(sn is the generic version of wn, xn, yn)
Our current measure for optimization is the following value:
Op(A;n,m)=limxm→wn[−log|Comp(A:n,m)|]
Op is positive when the nodes in A are doing something optimizer-ish towards the node m. This corresponds when Comp is < 1. We can understand this as when A is allowed to vary with respect to changes in sn, the change that propagates forwards to sm is smaller.
Op is negative when the nodes in A are doing something like "amplification" of the variance in m. Specifically, we refer to A optimizing m with respect to n around the specific trajectory W, by an amount of nats equal to Op(A;n,m). We'll investigate this measure in a few different systems.
A Better Thermostat Model
Our old thermostat was not a particularly good model of a thermostat. Realistically a thermostat cannot apply infinite heating or cooling to a system. For a better model let's consider the function
Therm(θ,p;sT)⎧⎪⎨⎪⎩pθ≤sTpθsT−θ<sT<θ−psT≤−θ
Now imagine we redefine our continuous thermostat like this:
sTt+δt=sRt−d
sRt+δt=sRt−δt×Therm(θ,p;sTt)
Within the narrow "basin" of −θ≤sT≤θ, it behaves like before. But outside the change in temperature over time is constant. This looks like the following:
When we look at our optimizing measure, we can see that in while sR remains in the linear decreasing region, Op=0. It only increases when sR reaches the exponentially decreasing region.
Now we might want to ask ourselves another question, for what values of sR0 is Op(T;sR0,sRt) positive for a given value of t, say t=10? Let's set p=1, θ=1 and the initial sT0=0. The graph of this looks like the following:
Every initial sR which has a trajectory which leads into the "optimizing region" between the temperatures of 24 and 26 is optimized a bit. The maximum Op values are trajectories which start in this region.
Point 1: We can Recover an Intuitive Measure of Optimization
What we might want to do is measure the "amount" of optimization in this region, between the points sR0=5 and sR0=45, with respect to sR10. If we choose this measure to be the integral of 1−Comp, we get some nice properties.
It (almost) no longer depends on θ, but depends linearly on p.
θ=1, p=1 gives an integral of 19.961
θ=0.5, p=1 gives an integral of 19.611
θ=1, p=0.5 gives an integral of 9.980
As θ→0, our integral remains (pretty much) the same. This is good because it means we can assign some "opimizing power" to a thermostat which acts in the "standard" way, i.e. applying a change of +p each time unit to the temperature if it's below the set point, and a change of −p each time unit if it's above the set point. And it's no coincidence that that power is equal to 2pt.
Let's take a step back to consider what we've done here. If we consider the following differential equation:
dTdt={−pT>Tset0T=TsetpT<Tset
It certainly looks like T values are being compressed about Tset by 2p per time unit, but that requires us to do a somewhat awkward manoeuvre: We have to equivocate our metric of the space of T at t=10 with our metric of the space of T at t=0. For temperatures this can be done in a natural way, but this doesn't necessarily extend to other systems. It also doesn't stack up well with systems which naturally compress themselves along some sort of axis, for example water going into a plughole.
We've managed to recreate this using what I consider to be a much more flexible, well-defined, and natural measure. This is a good sign for our measure.
The Lorenz System
This is a famed system defined by the differential equations:
dadt=σ(a−b)
dbdt=a(ρ−c)−b
dcdt=ab−βc
(I have made the notational change from the "standard" x,y,z→a,b,c in order to avoid collision with my own notation)
Which can fairly easily and relatively accurately be converted to discreet time. We'll keep σ=10,β=83 as constant values. For values of ρ<1 we have a single stable equilibrium point. For values 1<ρ<24.74 we get three stable equilibria, and for values ρ>24.74 we have a chaotic system. We'll investigate the first and third cases.
The most natural choices for A are all of any one of the ab or c values. We could also equally validly choose A to be a pair of them, although this might cause some issues. A reasonable choice for n would be the initial value of either of the two a, b, or c which aren't chosen for A.
Point 2: Even Around a Stable Equilibrium, Op(A;n,m) an be Negative
Let's choose ρ=0.8, which means we have a single stable point at a=0,b=0,c=0. Here are plots for the choice of a as the set A, and sb0 as the axis along which to measure optimization. (So we're changing the value of sb0 and looking at how future values of sbt and sct change, depending on whether or not values of sat are allowed to change)
Due to my poor matplotlib abilities, those all look like one graph. This indicates that we are not in the chaotic region of the Lorenz system. The variables a, b, and c approach zero in all cases.
As we can see, difference ybt−wbt is greater than the difference xbt−wbt. The mathematics of this are difficult to interpret meaningfully, so I'll settle with the idea that changes in a, b, and c in some way compound on one another over time, even as all three approach zero. When we plot values for Op we get this:
The values for Op(a;sb0,sb) and Op(a;sb0,sc) are negative, as expected. This is actually really important! It's important that our measure captures the fact that even though the future is being "compressed" — in the sense that future values of a, b, and c approach zero as t→∞ — it's not necessarily the case that these variables (which are the only variables in the system) are optimizing each other.
Point 3: Our Measures Throw Up Issues in Some Cases
Now what about variation along the axis sc0?
We run into a bit of an issue! For a small chunk of time, the differences xc−wc and yc−wc have different signs. This causes Op to be complex valued, whoops!
Point 4: Our Measures are Very Messy in Chaotic Environments
When when choose ρ=28, it's a different story. Here we are with a as A, sb0 as the axis of optimization:
Now the variations are huge! And they're wild and fluctuating.
Huge variations across everything. This is basically what it means to have a chaotic system. But interestingly there is a trend towards Op becoming negative in most cases, which should tell us something, namely that these things are spreading one another out.
What happens if we define A as a,t≤10? This means that for sat values with t>10 we allow a difference between the wa and ya values. We get graphs that look like this:
This is actually a good sign. Since A only has a finite amount of influence, we'd expect that it can only de-optimize b and c by a finite degree into the future.
Point 5: Op Seems to be Defined Even in Chaotic Systems
It's also worth noting that we're only looking at an approximation of Op here. What happens when we reduce the δb=xb0−wb0 by some amount? In our other cases we get the same answer. Let's just consider the effect on sb.
Works for a shorter simulation, what about a longer one?
This seems to be working mostly fine.
Conclusions and Next Steps
Looks like our system is working reasonably well. I'd like to apply it to some even more complex models but I don't particularly know which ones to use yet! I'd also like to look at landscapes of Op and Comp values for the Lorenz system, the same way I looked at landscapes of the thermostat system. The aim is to be able to apply this analysis to a neural network.
In the last post I introduced a potential measure for optimization, and applied it to a very simple system. In this post I will show how it applies to some more complex systems. My five takeaways so far are:
It's good to be precise with our language, so let's be precise. Remember our model system which looks like this:
In this network, each node is represented by a real number. We'll use superscript notation to notate the value of a node: wn is the value of node n in the world W.
The heart of this is a quantity I'll call Comp, which is:
Comp(A; n, m)=limxm→wn[xm−wmym−wm]
Which is equivalent to.
Comp(A:n,m)=∂sm∂sn∣∣A varies/∂sm∂sn∣∣A constant
(sn is the generic version of wn, xn, yn)
Our current measure for optimization is the following value:
Op(A;n,m)=limxm→wn[−log|Comp(A:n,m)|]
Op is positive when the nodes in A are doing something optimizer-ish towards the node m. This corresponds when Comp is < 1. We can understand this as when A is allowed to vary with respect to changes in sn, the change that propagates forwards to sm is smaller.
Op is negative when the nodes in A are doing something like "amplification" of the variance in m. Specifically, we refer to A optimizing m with respect to n around the specific trajectory W, by an amount of nats equal to Op(A;n,m). We'll investigate this measure in a few different systems.
A Better Thermostat Model
Our old thermostat was not a particularly good model of a thermostat. Realistically a thermostat cannot apply infinite heating or cooling to a system. For a better model let's consider the function
Therm(θ, p; sT)⎧⎪⎨⎪⎩p θ≤sTpθsT −θ<sT<θ−p sT≤−θ
Now imagine we redefine our continuous thermostat like this:
sTt+δt=sRt−d
sRt+δt=sRt−δt×Therm(θ, p; sTt)
Within the narrow "basin" of −θ≤sT≤θ, it behaves like before. But outside the change in temperature over time is constant. This looks like the following:
When we look at our optimizing measure, we can see that in while sR remains in the linear decreasing region, Op=0. It only increases when sR reaches the exponentially decreasing region.
Now we might want to ask ourselves another question, for what values of sR0 is Op(T;sR0,sRt) positive for a given value of t, say t=10? Let's set p=1, θ=1 and the initial sT0=0. The graph of this looks like the following:
Every initial sR which has a trajectory which leads into the "optimizing region" between the temperatures of 24 and 26 is optimized a bit. The maximum Op values are trajectories which start in this region.
Point 1: We can Recover an Intuitive Measure of Optimization
What we might want to do is measure the "amount" of optimization in this region, between the points sR0=5 and sR0=45, with respect to sR10. If we choose this measure to be the integral of 1−Comp, we get some nice properties.
It (almost) no longer depends on θ, but depends linearly on p.
θ=1, p=1 gives an integral of 19.961
θ=0.5, p=1 gives an integral of 19.611
θ=1, p=0.5 gives an integral of 9.980
As θ→0, our integral remains (pretty much) the same. This is good because it means we can assign some "opimizing power" to a thermostat which acts in the "standard" way, i.e. applying a change of +p each time unit to the temperature if it's below the set point, and a change of −p each time unit if it's above the set point. And it's no coincidence that that power is equal to 2pt.
Let's take a step back to consider what we've done here. If we consider the following differential equation:
dTdt={−p T>Tset 0 T=Tset p T<Tset
It certainly looks like T values are being compressed about Tset by 2p per time unit, but that requires us to do a somewhat awkward manoeuvre: We have to equivocate our metric of the space of T at t=10 with our metric of the space of T at t=0. For temperatures this can be done in a natural way, but this doesn't necessarily extend to other systems. It also doesn't stack up well with systems which naturally compress themselves along some sort of axis, for example water going into a plughole.
We've managed to recreate this using what I consider to be a much more flexible, well-defined, and natural measure. This is a good sign for our measure.
The Lorenz System
This is a famed system defined by the differential equations:
dadt=σ(a−b)
dbdt=a(ρ−c)−b
dcdt=ab−βc
(I have made the notational change from the "standard" x,y,z→a,b,c in order to avoid collision with my own notation)
Which can fairly easily and relatively accurately be converted to discreet time. We'll keep σ=10,β=83 as constant values. For values of ρ<1 we have a single stable equilibrium point. For values 1<ρ<24.74 we get three stable equilibria, and for values ρ>24.74 we have a chaotic system. We'll investigate the first and third cases.
The most natural choices for A are all of any one of the a b or c values. We could also equally validly choose A to be a pair of them, although this might cause some issues. A reasonable choice for n would be the initial value of either of the two a, b, or c which aren't chosen for A.
Point 2: Even Around a Stable Equilibrium, Op(A;n,m) an be Negative
Let's choose ρ=0.8, which means we have a single stable point at a=0,b=0,c=0. Here are plots for the choice of a as the set A, and sb0 as the axis along which to measure optimization. (So we're changing the value of sb0 and looking at how future values of sbt and sct change, depending on whether or not values of sat are allowed to change)
Due to my poor matplotlib abilities, those all look like one graph. This indicates that we are not in the chaotic region of the Lorenz system. The variables a, b, and c approach zero in all cases.
As we can see, difference ybt−wbt is greater than the difference xbt−wbt. The mathematics of this are difficult to interpret meaningfully, so I'll settle with the idea that changes in a, b, and c in some way compound on one another over time, even as all three approach zero. When we plot values for Op we get this:
The values for Op(a;sb0,sb) and Op(a;sb0,sc) are negative, as expected. This is actually really important! It's important that our measure captures the fact that even though the future is being "compressed" — in the sense that future values of a, b, and c approach zero as t→∞ — it's not necessarily the case that these variables (which are the only variables in the system) are optimizing each other.
Point 3: Our Measures Throw Up Issues in Some Cases
Now what about variation along the axis sc0?
We run into a bit of an issue! For a small chunk of time, the differences xc−wc and yc−wc have different signs. This causes Op to be complex valued, whoops!
Point 4: Our Measures are Very Messy in Chaotic Environments
When when choose ρ=28, it's a different story. Here we are with a as A, sb0 as the axis of optimization:
Now the variations are huge! And they're wild and fluctuating.
Huge variations across everything. This is basically what it means to have a chaotic system. But interestingly there is a trend towards Op becoming negative in most cases, which should tell us something, namely that these things are spreading one another out.
What happens if we define A as a,t≤10? This means that for sat values with t>10 we allow a difference between the wa and ya values. We get graphs that look like this:
This is actually a good sign. Since A only has a finite amount of influence, we'd expect that it can only de-optimize b and c by a finite degree into the future.
Point 5: Op Seems to be Defined Even in Chaotic Systems
It's also worth noting that we're only looking at an approximation of Op here. What happens when we reduce the δb=xb0−wb0 by some amount? In our other cases we get the same answer. Let's just consider the effect on sb.
Works for a shorter simulation, what about a longer one?
This seems to be working mostly fine.
Conclusions and Next Steps
Looks like our system is working reasonably well. I'd like to apply it to some even more complex models but I don't particularly know which ones to use yet! I'd also like to look at landscapes of Op and Comp values for the Lorenz system, the same way I looked at landscapes of the thermostat system. The aim is to be able to apply this analysis to a neural network.