Introduction

Recently we (Elizabeth Van Nostrand and Alex Altair) started a project investigating chaos theory as an example of field formation.[1] The number one question you get when you tell people you are studying the history of chaos theory is “does that matter in any way?”.[2] Books and articles will list applications, but the same few seem to come up a lot, and when you dig in, application often means “wrote some papers about it” rather than “achieved commercial success”. 

In this post we checked a few commonly cited applications to see if they pan out. We didn’t do deep dives to prove the mathematical dependencies, just sanity checks.

Our findings: Big Chaos has a very good PR team, but the hype isn’t unmerited either. Most of the commonly touted applications never received wide usage, but chaos was at least instrumental in several important applications that are barely mentioned on wikipedia. And it was as important for weather as you think it is. 

Applications

Cryptography and random number generators- Strong No (Alex)

The wikipedia page for Chaos theory has a prominent section on cryptography. This sounds plausible; you certainly want your encryption algorithm to display sensitive dependence on initial conditions in the sense that changing a bit of your input randomizes the bits of your output. Similarly, one could imagine using the sequence of states of a chaotic system as a random number generator. However a quick google search makes me (Alex) think this is not a serious application.

I’ve seen it claimed[3] that one of the earliest pseudo-random number generators used the logistic map, but I was unable to find a primary reference to this from a quick search.

Some random number generators use physical entropy from outside the computer (rather than a pseudo-random mathematical computation). There are some proposals to do this by taking measurements from a physical chaotic system, such as an electronic circuit or lasers. This seems to be backward, and not actually used in practice. The idea is somewhat roasted in the Springer volume “Open Problems in Mathematics and Computational Science” 2014, chapter “True Random Number Generators” by Mario Stipčević and Çetin Kaya Koç.

Other sources that caused me to doubt the genuine application of chaos to crypto include this Crypto StackExchange question, and my friend who has done done cryptography research professionally.

As a final false positive example, a use of lava lamps as a source of randomness once gained some publicity. Though this was patented under an explicit reference to chaotic systems, it was only used to generate a random seed, which doesn’t really make use of the chaotic dynamics. It sounds to me like it’s just a novelty, and off-the-shelf crypto libraries would have been just fine.

Anesthesia, Fetal Monitoring, and Approximate Entropy- No (Elizabeth)

Approximate Entropy (ApEn) is a measurement designed to assess how regular and predictable a system is, a simplification of Kolmogorov-Sinai entropy. ApEn was originally invented for analyzing medical data, such as brain waves under anesthesia or fetal heart rate. It has several descendents, including Sample Entropy; for purposes of this article I’m going to refer to them all as ApEn. Researchers have since applied the hammer of ApEn and its children to many nails, but as far as I (Elizabeth) can tell it has never reached widespread usage.

ApEn’s original application was real time fetal heart monitoring; however as far as I can tell it never achieved commercial success and modern doctors use simpler algorithms to evaluate fetal monitoring data. 

ApEn has also been extensively investigated for monitoring brain waves under anesthesia. However commercially available products only offer Spectral Entropy (based purely on information theory, no chaos) and Bispectral Index

ApEn has been tried out in other fields, including posture, neurological issues, finance, and weather. I was unable to find any evidence any of these made it into practice, although if some day trader was making money with ApEn I wouldn’t expect them to tell me. 

Empirical Dynamical Modeling– Unproven (Elizabeth)

EDM is a framework for modeling chaotic systems without attempting to use parameters. It was first created by George Sugihara and Robert May (a prominent early advocate and developer of chaos theory), but Stephen Munch is the scientist most putting the tool into practice. Munch has an excellent-looking experiment in which he applies EDM to wild shrimp management (fisheries being one of two places you can make money with theoretical ecology[4]) and compares his output with other models. Alas, his results will not be available until 2041. At least they’ll be thorough.

Sugihara himself applied the framework across numerous fields (including a stint as a quant manager at Deutsche Bank), however his website for his consulting practice only mentions systems he’s modeled, not instances his work was put into practice. His work as an investment quant sounds like exactly the kind of thing that could show a decisive success, except there’s no evidence he was successful and mild evidence he wasn’t. 

Process note: one of the reasons I believed in the story of Chaos Theory as told in the classic Gleick book was that I (Elizabeth) studied theoretical ecology in college, and distinctly remembered learning chaos theory in that context. This let me confirm a lot of Gleick’s claims about ecology, which made me trust his claims about other fields more. I recently talked to the professor who taught me and learned that in the mid 00s he was one of only 2 or 3 ecologists taking chaos really seriously. If I’d gone to almost any other university at the time, I would not have walked out respecting chaos theory as a tool for ecology. 

Weather forecasting- Yes (Alex)

Weather forecasting seems to be a domain where ideas from chaos theory had substantial causal impact. That said, it is still unclear to me (Alex) how much this impact depended on the exact mathematical content of chaos theory; it’s not like current weather modeling software is importing a library called chaos.cpp. I think I can imagine a world where people realized early on that weather was pretty complicated, and that predicting it required techniques that didn’t rely on common simplifying assumptions, like locally linear approximations, or using maximum likelihood estimates.

Here is a brief historical narrative, to give you a sense of the entanglement between these two fields. Most of the below can be found in “Roots of Ensemble Forecasting” (Lewis 2004), although I have seen much of it corroborated across many other sources.

By the 1940s, weather forecasting was still being done manually, and there was not much ability to predict that far into the future. As large electronic computers were being developed, it became clear that they could provide substantially more computation for this purpose, perhaps making longer predictions feasible. John von Neumann was especially vocally optimistic on this front.

Initially people assumed that we would make useful weather predictions by doing the following; 1) formulate a dynamical model of the weather based on our knowledge of physics 2) program that model into the computer 3) take measurements of current conditions, and 4) feed those measurements into the computer to extrapolate a prediction for a reasonable timespan into the future. People knew this would be very challenging, and they expected to have to crank up the amount of compute, the number of measurements, and the accuracy of their model in order to improve their forecasts. These efforts began to acquire resources and governmental bodies to give it a serious go. Researchers developed simple models, which would have systematic errors, and then people would go on to attempt to find corrections to these errors. It sounds like these efforts were very much in the spirit of pragmatism, though not entirely consistent with known physical principles (like conservation of energy).

After a decade or so, various scientists began to suggest that there was something missing from the above scheme. Perhaps, instead of using our best-guess deterministic model run on our best-guess set of observations, we should instead run multiple forecasts, with variations in the models and input data. In case our best guess failed to predict some key phenomenon like a storm, this “ensemble” strategy may at least show the storm in one of its outputs. That would at least let us know to start paying attention to that possibility.

It sounds like there was some amount of resistance to this, though not a huge amount. Further work was done to make estimates of the limits of predictability based on the growth rate of errors (Philip Thompson, E. Novikov) and construct more physically principled models.

Around this point (the mid 1950s) enters Edward Lorenz, now known as one of the founding fathers of chaos theory. The oft-related anecdote is that he accidentally noticed sensitive dependence on initial conditions while doing computer simulations of weather. But in addition to this discovery, he was actively trying to convince people in weather forecasting that their simplifying assumptions were problematic. He impacted the field both by producing much good work and by being an active proponent of these new ideas. It is especially notable that the Lorenz system, a paradigmatic chaotic system, came from his deliberate attempt to take a real weather model (of convection cells in a temperature differential) and simplify it down to the smallest possible system that maintained both the chaotic behavior and the reflection of reality. By cutting it down to three dimensions, he allowed people to see how a deterministic system could display chaotic behavior, with spectacular visuals.

Through continued work (especially Edward Epstein’s 1969 paper “Stochastic Dynamic Prediction”) people became convinced that weather forecasting needed to be done with some kind of ensemble method (i.e. not just using one predicted outcome). However, unlike the Lorenz system, useful weather models are very complicated. It is not feasible to use a strategy where, for example, you input a prior probability distribution over your high-dimensional observation vector and then analytically calculate out the mean and standard deviation etc. of each of the desired future observations. Instead, you need to use a technique like Monte Carlo, where you randomly sample from the prior distribution, and run each of those individual data points through the model, producing a distribution of outputs.

But now we have another problem; instead of calculating one prediction, you are calculating many. There is an inherent trade-off in how to use your limited compute budget. So for something like two decades, people continued to use the one-best-guess method while computing got faster, cheaper and more parallelized. During this wait, researchers worked on technical issues, like just how much uncertainty they should expect from specific weather models, and how exactly to choose the ensemble members. (It turns out that people do not even use the “ideal” Monte Carlo method mentioned above, and instead use heuristical techniques involving things like “singular vectors” and “breeding vectors”)

In the early 1990s, the major national weather forecast agencies finally switched to delivering probabilistic forecasts from ensemble prediction systems. The usefulness of these improved predictions is universally recognized; they are critical not just for deciding whether to pack an extra jacket, but also for evacuation planning, deciding when to harvest crops, and staging military operations.

Fractals- Yes (Elizabeth)

Fractals have been credited for a number of advancements, including better mapping software, better antennas, and Nassim Taleb’s investing strategy. I (Elizabeth) am unclear how much the mathematics of fractals were absolutely necessary for these developments (and would bet against for that last one), but they might well be on the causal path in practice.

Mandelbrot’s work on phone line errors is more upstream than downstream of fractals, but produced legible economic value by demonstrating that phone companies couldn’t solve errors via their existing path of more and more powerful phone lines. Instead, they needed redundancy to compensate for the errors that would inevitably occur. Again I feel like it doesn’t take a specific mathematical theory to consider redundancy as a solution, but that may be because I grew up in a post-fractal world where the idea was in the water supply. And then I learned the details of TCP/IP where redundancy is baked in. 

Final thoughts

Every five hours we spent on this, we changed our mind about how important chaos theory was. Elizabeth discovered the fractals applications after she was officially done and waiting for Alex to finish his part.

We both find the whole brand of chaos confusing. The wikipedia page on fractals devotes many misleading paragraphs to applications that never made it into practice. But nowhere does it mention fractal antennas, which first created economic value 30 years ago and now power cell phones and wifi. It’s almost like unproductive fields rush to invoke chaos to improve their PR, while productive applications don’t bother. It’s not that they hide it, they just don’t go out of their way to promote themselves and chaos. 

Another major thread that came up was that there are a number of cases that benefited from the concepts of uncertainty and unpredictability, but didn’t use any actual chaos math. I have a hunch that chaos may have provided cover to many projects whose funders and bosses would otherwise have demanded an impossible amount of predictability. Formal chaos shouldn’t have been necessary for this, but working around human stupidity is an application. 

Acknowledgements

Thank you to Lightspeed Grants and Elizabeth’s Patreon patrons for supporting her part of this work. Did you know it’s pledge drive week at AcesoUnderGlass.com?

  1. ^

    This is a follow up to Elizabeth’s 2022 work on plate tectonics

  2. ^

    The second most popular is “oh you should read that book by Gleick”

  3. ^

    In Chaos: a very short introduction, page 44, and in this youtube video

  4. ^

    The other is epidemiology

New Comment
45 comments, sorted by Click to highlight new comments since:

Hi! I seem run into chaos relatively often in practice. It's extremely useful and not likely to have flagship applications because it mostly serves to rule out solutions. The workflow looks like 

"I have an idea! It is very brilliant"

"Your idea is wonderful, but it's probably fucked by chaos. Calculate a Lyapunov exponent"

calculates Lyapunov exponent

"fuck"

But this is of course much better than trying the idea for weeks or months without a concept of why it's impossible

Thanks, this is very useful, I have many follow up questions. 

How possible is it to genuinely rule out a solution? how much does chaos beat a rock that says "most new things fail"? Is there any cache of details I could cite? 

The good news is that chaos theory can rule out solutions with extreme prejudice- and because it's a formal theory, it lets you be very clear if it's ruling out a solution absolutely (FluttershAI and Clippy combined aren't going to be able to predict the weather a decade in advance) vs if it's ruling out a solution in all practicality, but teeeechnically (ie, predicing 4-5 swings of a double pendulum).  Here are the concrete examples that come to mind:

I wrote a CPU N-Body simulator,  and then ported it to CUDA. I can't test that the port is correct by comparing long trajectories of the CPU simulator to the CUDA simulator, and because I know this is a chaos problem, I won't try to fix this by adding epsilons to the test. Instead, I will fix this by running the test simulation for ~ < a Lyapunov Time.

I wrote a genetic algorithm for designing trebuchets. The final machine, while very efficient, is visibly using precise timing after several cycles of a double pendulum. Therefore, I know it can't be built in real life. 

I see a viral gif of black and white balls being dumped in a pile, so that when they come to rest they form a smiley face. I know it's fake because that's not something that can be achieved with careful orchestration

I prove a differential equation is chaotic, so I don't need to try to find a closed form solution.

One thing that jumps out, writing this out explicitly, is that chaos theory concievably could be replaced with intuition of "well obviously that won't work," and so I don't know to what extent chaos theory just formulated wisdom that existed pre-1950s, vs generated wisdom that got incorporated into modern common sense. Either way, formal is nice- in particular, the "can't test end to end simulations by direct comparison" and "can't find a closed form solution" cases saved me a lot of time.

One thing that jumps out, writing this out explicitly, is that chaos theory concievably could be replaced with intuition of "well obviously that won't work," and so I don't know to what extent chaos theory just formulated wisdom that existed pre-1950s, vs generated wisdom that got incorporated into modern common sense. 

Yeah, this is top of my mind as well. to get a sense of the cultural shift I'm trying to interview people who were around for the change or at least knew people who were. If anyone knows any boomer scientists or mathematicians with thoughts on this I'd love to talk to them

I haven't nailed this down yet, but there's an interesting nugget along the lines of chaos being used to get scientists comfortable with existential uncertainty, which (some) mathematicians already were. My claim of the opposite triggered a great discussion on twitter[1], and @Jeffrey Heninger  gave me a great concept: "the chaos paradigm shift was moving from "what is the solution" to "what can I know, even if the solution is unknowable?" That kind of cultural shift seems even more important than the math but harder to study. 

BTW thank you for your time and all your great answers, this is really helpful and I plan on featuring "ruling things out" if I do a follow-up post. 

  1. ^

    I've now had two ex-mathematicians, one ex-astrophysicist, and one of the foremost theoretical ecologists in the world convincingly disagree with me.

Very interesting. Could you give more details one one of these concrete examples?

e.g. calculating a lyapunov exponent or proving a differential equation is chaotic for a DE that came up in practice for you 

Differential equation example: I wanted a closed form solution of the range of the simplest possible trebuchet- just a seesaw. This is perfectly achievable, see for example http://ffden-2.phys.uaf.edu/211.fall2000.web.projects/J%20Mann%20and%20J%20James/see_saw.html. I wanted a closed form solution of the second simplest trebuchet, a seesaw with a sling. This is impossible, because even though the motion of the trebuchet with sling isn't chaotic during the throw, it can be made chaotic by just varying the initial conditions, which rules out a simple closed form solution for non-chaotic initial conditions.

Lyapunov exponent example: for the bouncing balls, if each ball travels 1 diameter between bounces, then a change in velocity angle of 1 degree pre-bounce becomes a change in angle of 4 degrees post bounce (this number may be 6- I am bad at geometry), so the exponent is 4 if time is measured in bounces.
 

I'm curious about this part;

even though the motion of the trebuchet with sling isn't chaotic during the throw, it can be made chaotic by just varying the initial conditions, which rules out a simple closed form solution for non-chaotic initial conditions

Do you know what theorems/whatever this is from? It seems to me that if you know that "throws" constitute a subset of phase space that isn't chaotic, then you should be able to have a closed-form solution for those trajectories.

So this turns out to be a doozy, but it's really fascinating. I don't have an answer- an answer would look like "normal chaotic differential equations don't have general exact solutions" or "there is no relationship between being chaotic and not having an exact solution" but deciding which is which won't just require proof, it would also require good definitions of "normal differential equation" and "exact solution." (the good definition of "general" is "initial conditions with exact solutions have nonzero measure") I have some work.

A chaotic differential equation has to be nonlinear and at least third order- and almost all nonlinear third order differential equations don't admit general exact solutions. So, the statement "as a heuristic, chaotic differential equations don't have general exact solutions" seems pretty unimpressive. However, I wrongly believed the strong version of this heuristic and that belief was useful: I wanted to model trebuchet arm-sling dynamics, recognized that the true form could not be solved, and switched to a simplified model based on what simplifications would prevent chaos (no gravity, sling is wrapped around a drum instead of fixed to the tip of an arm) and then was able to find an exact solution (note that this solvable system starts as nonlinear 4th order, but can be reduced using conservation of angular momentum hacks)

Now, it is known that a chaotic difference equation can have an exact solution: the equation x(n+1) = 2x(n) mod 1 is formally chaotic and has the exact solution 2^n x mod 1.  A chaotic differential equation exhibiting chaotic behaviour can have an exact solution if it has discontinuous derivatives because this difference equation can be constructed: 

equation is in three variables x, y, z

dz/dt always equals 1

if 0 < z < 1:
    if x > 0:
        dx/dt = 0
        dy dt = 1

    if x < 0:

        dx/dt = 0

        dy/dt = -1

if 1 < z < 2: 

    if y > 0

    dx/dx = -.5

    dy dt = 0

    if y < 0

    dy dt = 0

    dx dt = .5

if 2 < z < 3:

    dx/dt = x ln(2)

    dy/dt = -(y)/(3 - t)

and then make it periodic by gluing z=0 to z=3 in phase space. (This is pretty similar to the structure of the lorentz attractor, except that in the lorentz system, the sheets of solutions get close together but don't actually meet.) This is an awful,weird ode: the derivative is discontinuous, and not even bounded near the point where the sheets of solutions merge.

Plenty of prototypical chaotic differential equations have a sprinkling of exact solutions: e.g, three bodies orbiting in an equilateral triangle- hence the requirement for a "general" exact solution.

The three body problem "has" an "exact" "series" "solution" but it appears to be quite effed: for one thing, no one will tell me the coefficient of the first term. I suspect that in fact the first term is calculated by solving the motion for all time, and then finding increasingly good series approximations to that motion.

I strongly suspect that the correct answer to this question can be found in one of these stack overflow posts, but I have yet to fully understand them:

https://physics.stackexchange.com/questions/340795/why-are-we-sure-that-integrals-of-motion-dont-exist-in-a-chaotic-system?rq=1


https://physics.stackexchange.com/questions/201547/chaos-and-integrability-in-classical-mechanics

There are certainly billiards with chaotic and exactly solvable components- if nothing else, place a circular billiard next to an oval. So, for the original claim to be true in any meaningful way, this may have to involve excluding all differential equations with case statements- which sounds increasingly unlike a true, fundamental theorem.

If this isn't an open problem, then there is somewhere on the internet a chaotic, normal-looking system of odes (would have aesthetics like x'''' = sin(x''') - x'y''', y' = (1-y / x') etc) posted next to a general exact solution, perhaps only valid for non chaotic initial conditions, or a proof that no such system exists. The solvable system is probably out there and related to billiards


Final edit: the series solution to the three body problem is legit mathematically, see page 64 here


https://ntrs.nasa.gov/citations/19670005590

So “can’t find general exact solution to chaotic differential equation” is just uncomplicatedly false

I am suddenly unsure whether it is true! It certainly would have to be more specific than how I phrased it, as it is trivially false if the differential equation is allowed to be discontinuous between closed form regions and chaotic regions

I am afraid I don't quite understand the bouncing balls example, could you give a little more detail? Thank you in advance !

Not a substantive response, just wanted to say that I really really like your comment for having so many detailed real-world examples.

[-]Jay10

The seminal result for chaos theory came from weather modeling.  An atmospheric model was migrated to a more powerful computer, but it didn't give the same results as it had on the old computer.  It turned out that, in the process of migration, the initial condition data had been rounded to the eighth decimal place.  The tiny errors compounded into larger errors, and over the course of an in-model month the predictions completely diverged.  An error in the eighth decimal place is roughly comparable to the flap of a butterfly's wing, which led to the cliche about butterflies and hurricanes.

If you're trying to model a system, and the results of your model are extremely sensitive to miniscule data errors (i.e. the system is chaotic), and there is no practical way to obtain extremely accurate data, then chaos theory limits the usefulness of the model.  It may still have some value; using standard models and available data it's possible to predict the weather rather accurately for a few days and semi-accurately for a few days more, but it may not be able to predict what you need.

This is one reason I've always been skeptical of the "uploaded brain" idea.  My intuition is that inevitable minor errors in the model of the brain would cause the model to diverge from the source in a fairly short time.

This is one reason I've always been skeptical of the "uploaded brain" idea. My intuition is that inevitable minor errors in the model of the brain would cause the model to diverge from the source in a fairly short time.

This is true, but also e.g. minor environmental perturbations like seeing something at a slightly different time would also cause one to diverge from what one otherwise would have been in a fairly short time, so it seems like any notion of personal identity just has to be robust to exponential divergence.

[-]Jay-20

Consider - A typical human brain has ~100 trillion synapses.  Any attempt to map it would have some error rate.  Is it still "you" if the error rate is .1%?  1%? 10%?  Do positive vs. negative errors make a difference (i.e. missing connections vs. spurious connections)?  

Is this a way to get new and exciting psychiatric disorders?

I don't know the answers, or even how we'd try to figure out the answers, but I don't want to spend eternity as this guy.  

or even how we'd try to figure out the answers

Trial and error? E.g. first you upload animal subjects and see what's fidelity seems to preserve all the animal traits you can find. At some point you then start with human volunteers (perhaps preferentially dying people?), and see whether the rates that seem to work for nonhuman animals also work for humans.

Also I guess once you have a mostly-working human upload, you can test perturbations to this upload to see what factors they are most sensitive to.

[-]Ben78

My position is that either (1) my brain is computationally stable, in the sense that what I think, how I think it and what I decide to do after thinking is fundamentally about my algorithm (personality/mind), and that tiny changes in the conditions (a random thermal fluctuation), are usually not important. Alternatively (2) my brain is not a reliable/robust machine, and my behaviour is very sensitive to the random thermal fluctuations of atoms in my brain.

In the first case, we wouldn't expect small errors (for some value of small) in the uploaded brain to result in significant divergence from the real person (stability). In the second case I am left wondering why I would particularly care. Are the random thermal fluctuations pushing me around somehow better than the equally random measurement errors pushing my soft-copy around?

So, I don't think uploaded brains can be ruled out a priori on precision grounds. There exists a non-infinite amount of precision that suffices, the necessary precision is upper bounded by the thermal randomness in a body temperature brain.

[-]Jay-11

Surely both (1) and (2) are true, each to a certain extent.

Are the random thermal fluctuations pushing me around somehow better than the equally random measurement errors pushing my soft-copy around?

It depends.  We know from experience how meat brains change over time.  We have no idea how software brains change over time; it surely depends on the details of the technology used.  The changes might be comparable, but they might be bizarre.  The longer you run the program, the more extreme the changes are likely to be.

I can't rule it out either.  Nor can I rule it in.  It's conceivable, but there are enough issues that I'm highly skeptical.  

[-]Ben20

I might be misunderstanding your point. My opinion is that software brains are extremely difficult (possibly impossibly difficult) because brains are complicated. Your position, as I understand it, is that they are extremely difficult (possibly impossibly difficult) because brains are chaotic.

If its the former (complexity) then there exists a sufficiently advanced model of the human brain that can work (where "sufficiently advanced" here means "probably always science fiction"). If brains are assumed to be chaotic then a lot of what people think and do is random, and the simulated brains will necessarily end up with a different random seed due to measurement errors. This would be important in some brain simulating contexts, for example it would make predicting someone's future behaviour based on a simulation of their brain impossible. (Omega from Newcomb's paradox would struggle to predict whether people would two-box or not.) However, from the point of view of chasing immortality for yourself or a loved one the chaos doesn't seem to be an immediate problem. If my decision to one-box was fundamentally random (down to thermal fluctuations) and trivial changes on the day could have changed my mind, then it couldn't have been part of my personality. My point was, from the immortality point of view, we only really care about preserving the signal, and can accept different noise.

[-]Jay30

I certainly agree that brains are complicated.

I think part of the difference is that I'm considering the uploading process; it seems to me that you're skipping past it, which amounts to assuming it works perfectly.

Consider the upload of Bob the volunteer.  The idea that software = Bob is based on the idea that Bob's connectome of roughly 100 trillion synapses is accurately captured by the upload process.  It seems fairly obvious to me that this process will not capture every single synapse with no errors (at least in early versions).  It will miss a percentage and probably also invent some that meat-Bob doesn't have.

This raises the question of how good a copy is good enough.  If brains are chaotic, and I would expect them to be, even small error rates would have large consequences for the output of the simulation.  In short, I would expect that for semi-realistic upload accuracy (whatever that means in this context), simulated Bob wouldn't think or behave much like actual Bob.  

Well, now I'm wondering - is neural network training chaotic?

Sometimes!

https://sohl-dickstein.github.io/2024/02/12/fractal.html

Huh, interesting! So the way I'm thinking about this is, your loss landscape determines the attractor/repellor structure of your phase space (= network parameter space). For a (reasonable) optimization algorithm to have chaotic behavior on that landscape, it seems like the landscape would either have to have 1) a positive-measure flat region, on which the dynamics were ergodic, or 2) a strange attractor, which seems more plausible.

I'm not sure how that relates to the above link; it mentions the parameters "diverging", but it's not clear to me how neural network weights can diverge; aren't they bounded?

If you're trying to model a system, and the results of your model are extremely sensitive to miniscule data errors (i.e. the system is chaotic), and there is no practical way to obtain extremely accurate data, then chaos theory limits the usefulness of the model.

 

This seems like a very underpowered sentence that doesn't actually need chaos theory. How do you know you're in a system that is chaotic, as opposed to have shitty sensors or a terrible model? What do you get from the theory, as opposed to the empirical result that your predictions only stay accurate for so long?

[For everyone else: Hastings is addressing these questions more directly. But I'm still interesting in what Jay or anyone else has to say]. 

[-]Jay-10

Let's try again.  Chaotic systems usually don't do exactly what you want them to, and they almost never do the right thing 1000 times in a row.  If you model a system using ordinary modeling techniques, chaos theory can tell you whether the system is going to be finicky and unreliable (in a specific way).  This saves you the trouble of actually building a system that won't work reliably.  Basically, it marks off certain areas of solution space as not viable.

Also, there's Lavarand.  It turns out that lava lamps are chaotic.

For what it's worth, I think you're getting downvoted in part because what you write seems to indicate that you didn't read the post.

[+]Jay-50
[-]Ruby40

Mandelbrot’s work on phone line errors is more upstream than downstream of fractals, but produced legible economic value by demonstrating that phone companies couldn’t solve errors via their existing path of more and more powerful phone lines. Instead, they needed redundancy to compensate for the errors that would inevitably occur. Again I feel like it doesn’t take a specific mathematical theory to consider redundancy as a solution, but that may be because I grew up in a post-fractal world where the idea was in the water supply. And then I learned the details of TCP/IP where redundancy is baked in.

Huh, I thought all of this was covered by Shannon information theory already.

Yes, this sounds more like noisy channel coding theorem. But presumably what is meant are these "fractal antennas". 

The paper Gleick was referring to is this one, but it would be a lot of work to discern whether it was causal in getting telephone companies to do anything different. It sounds to me like the paper is saying that the particular telephone error data they were looking at could not be well-modeled as IID, nor could it be well-modeled as a standard Markov chain; instead, it was best modeled as a statistical fractal, which corresponds to a heavy-tailed distribution somehow.

What happened with Approximate Entropy was that Chaos could be useful, it just wasn't as useful as a pure information theory derived solution. Wouldn't surprise me if that were true here as well.

Gleick gives Mandelbrot credit for this, but it wouldn't be the first major misrepresentation I've found in the Gleick book.

I know someone's gonna ask so let me share the most powerful misrepresentation I've found so far: Gleick talks about the Chaos Cabal at UC Santa Cruz creating the visualization tools for Chaos all on their own. In The Chaos Avante Garde, Professor Ralph Abraham describes himself as a supporter of the students (could be misleading) and, separately, founding the Visual Math Project. VMP started with tools for undergrads but he claims ownership of chaos work within a few years. I don't know if the Chaos Cabal literally worked under Abraham, but it sure seems likely the presence of VMP affected their own visualization tools. 

Would I be correct to say that chaos as a science lives in the margins of error of existing measuring instruments. For weather, we could have one input, atmospheric pressure, say 760 +/- 0.05 mm Hg (margin of error stated). So the actual pressure is between 759.95 and 760.05 mm Hg and this range just happens to be the "small difference(s) in initial value" that leads to prediction divergence. That is the weather forecast could be as opposite as bright, sunny and heavy rain, stormy depending on whether you input 759.95 instead of 760 or 760.05 instead of 760 or 759.95 instead of 760.05 for atmospheric pressure. Doesn't this mean chaos theory says more about the accuracy of our instruments than about actual chaos in so-called chaotic systems. It possesses this subjective element (what we consider to be negligible differences) that seems to undermine its standing as a legitimate mathematical discipline. 

I also call into question the divergence, at least in weather prediction. Bright and sunny, how different/divergent is it from thunderstorm? There could be something lost in translation, going from numerical outputs to natural language descriptions like sunny, rainy.. etc.

That said, I've seen chaotic double pendulums and those do seem to exhibit real divergence in behavior; if the location of the pendulum's bob is our output then there definitely is a large numerical difference, especially if we consider how close to each other  the initial positions were, which is the point I suppose. We could artificially amplify this divergence by making the pendulum's arms longer, which tells its own story.

The margins of error of existing measuring instruments will tell you how long you can expect your simulation to resemble reality, but an exponential decrease in measurement error will only buy you a linear increase in how long that simulation is good for.

I also call into question the divergence, at least in weather prediction. Bright and sunny, how different/divergent is it from thunderstorm? There could be something lost in translation, going from numerical outputs to natural language descriptions like sunny, rainy.. etc.

If you don't like descriptive stuff like "sunny" or "thunderstorm" you could use metrics like "watts per square meter of sunlight at surface level" or "atmospheric pressure" or "rainfall rate". You will still observe a divergence in behavior between models with arbitrarily small differences in initial state (and between your model and the behavior of the real world).

[-]Jay21

an exponential decrease in measurement error will only buy you a linear increase in how long that simulation is good for.

True, and in the real world attempts to measure with extreme precision eventually hit limits imposed by quantum mechanics.  Quantum systems are unpredictable in a way that has nothing to do with chaos theory, but that cashes out to injecting tiny amounts of randomness in basically every physical system.  In a chaotic system those tiny perturbations would eventually have macroscopic effects, even in the absence of any other sources of error.

I don't know the exact values Lorenz used in his weather simulation, but Wikipedia says "so a value like 0.506127 printed as 0.506". If this were atmospheric pressure, we're talking about a millionth decimal place precision. I don't know what exerts 0.000001 Pa of pressure or to what such a teeny pressure matters.

[-]Jay42

That's the point.  Nobody thought such tiny variations would matter.  The fact that they can matter, a lot, was the discovery that led to chaos theory.

Most kind of you to reply. I couldn't catch all that; I'm mathematically semiliterate. I was just wondering if the key idea "small differences" (in initial states) manifests at the output end (the weather forecast) too. I mean it's quite possible (given what I know, not much) that (say) an atmospheric pressure difference  of 0.01 Pa in the output could mean the difference between rain and shine. Given what you wrote I'm wrong, oui? If I were correct then the chaos resides in the weather, not the output (where the differences are as negligible as in the inputs). 

I know that there's something called the Lyapunov exponent. Could we "diminish the chaos" if we use logarithms, like with the Richter scale for earthquakes? I was told that logarithms, though they rise rapidly in the beginning, ultimately end up plateauing: log 1 million - log 100 = 4 (only)??? log 100 inches rain and log 1 inch rain = 2 (only)?

I hope you'll forgive me if I'm talking out of my hat here. It's an interesting topic and I tried my best to read and understand what I read. 

Gracias, have an awesome day.

I know that there's something called the Lyapunov exponent. Could we "diminish the chaos" if we use logarithms, like with the Richter scale for earthquakes?

This is a neat question. I think the answer is no, and here's my attempt to describe why.

The Lyapunov exponent measures the difference between the trajectories over time. If your system is the double pendulum, you need to be able to take two random states of the double pendulum and say how different they are. So it's not like you're measuring the speed, or the length, or something like that. And if you have this distance metric on the whole space of double-pendulum states, then you can't "take the log" of all the distances at the same time (I think because that would break the triangle inequality).

Hopefully, not talking out of my hat, but the difference between the final states of a double pendulum can be typed:  

  1. Somewhere in the middle of the pendulum's journey through space and time. I've seen this visually and true there's divergence. This divergence is based on measurement of the pendulum's position in space at a given time. So with initial state , the pendulum at time  was at position  while beginning with initial state the pendulum at time  was at position .  The alleged divergence is the difference , oui? Take in absolute terms, , but logarithmically, 
  2. At the very end when the pendulum comes to rest. There's no divergence there, oui? 
[-]Jay10

Any physical system has a finite amount of mass and energy that limit its possible behaviors.  If you take the log of (one variable of) the system, its full range of behaviors will use fewer numbers, but that's all that will happen.  For example, the wind is usually between 0.001 m/s (quite still) and 100 m/s (unprecedented hurricane).  If you take the base-10 log, it's usually between -3 and 2.  A change of 2 can mean a change from .001 to .1 m/s (quite still to barely noticeable breeze) or a change from 1 m/s to 100 m/s (modest breeze to everything's gone).  For lots of common phenomena, log scales are too imprecise to be useful.

Chaotic systems can't be predicted in detail, but physics and common sense still apply.  Chaotic weather is just ordinary weather.

It possesses this subjective element (what we consider to be negligible differences) that seems to undermine its standing as a legitimate mathematical discipline.

I think I see what you're getting at here, but no, "chaotic" is a mathematical property that systems (of equations) either have or don't have. The idea behind sensitive dependence on initial conditions is that any difference, no matter how small, will eventually lead to diverging trajectories. Since it will happen for arbitrarily small differences, it will definitely happen for whatever difference exists within our ability to make measurements. But the more precisely you measure, the longer it will take for the trajectories to diverge (which is what faul_sname is referring to).

This is awesome, I would love more posts like this. Out of curiosity, how many hours have you and your colleague spent on this research.

I've spent about 55 hours on the whole project so far, of which maybe half went to this post. I think @Alex_Altair  is the same ballpark. I'm unhappy with how high that number is but the investigation was, uh, fractal, in how new details that needed investigation before we could publish kept popping up. 

Definitely on the order of "tens of hours", but it'd be hard to say more specifically. Also, almost all of that time (at least for me) went into learning stuff that didn't go into this post. Partly that's because the project is broader than this post, and partly because I have my own research priority of understanding systems theory pretty well.

What is Chaos Theory? It sounds to me like an arbitrary grouping of results of people playing around with computers, not a coherent theory. If it were about a social group, that provides more coherence. Indeed, the people who pushed the term "Chaos" do form a social group, but I do not think this group really includes all the people included in, say, Gleick's book.

A lot of the results were things that they could have predicted from theory before computers, but they don't seem to have been predicted. In particular Lyapunov died in 1918. If the theory is his theory, then it's hard to articulate what the people with computers contributed, but it may still have been important to actually use the computers. Similarly, I think it wrong to dismiss something as just information theory, not chaos theory. The only concrete result I know is the reconstruction from symbolic dynamics, but this makes it clear how to apply information theory.