I will give a potted history of Pearl's discovery as I understand it.
In the late 70s/early 80s, people wanted to deal with uncertainty in logic-based AI. The obvious thing to use is probability, but doing a Bayesian update to compute a posterior is exponentially expensive.
Pearl wanted to come up with a good data structure for doing computations over probability distributions in less-than-exponential time.
He introduced the idea of Bayesian networks in his paper Reverend Bayes On Inference Engines where he represents factorized probability distributions using DAGs. Here, the direction of the arrows is arbitrary and there are many DAGs corresponding to one probability distribution.
He was not thinking about causality at all, it was just a problem in data structures. The idea was this would be used for the same sort of thing as an "expert system" or other logic based AI systems, but taking into account uncertainty expressed probabilistically.
Later, people including Pearl noticed that you can and often should interpret the arrows as causal, this amounts to choosing one DAG from many. The fact that there are many possible DAGs is related to the fact that there are seemingly always multiple incompatible causal stories, to explain observations absent making additional assumptions about the world. But if you pick one, you can start using it to see whether your causal question can be answered from observational data alone.
Finally, he realized that the assumptions encoded in a DAG aren't sufficient for fully general counterfactuals, and realized that in full generality you have to specify exactly what functional relationship goes along each edge of the graph.
As someone originally concerned with AI, not with problems in the natural sciences, Pearl is probably unusual. Pearl himself looks back on Sewall Wright as his progenitor for coming up with path diagrams -- he was working in genetics. If you are interested in this, you should also look at Don Rubin's experience -- his causal framework is isomorphic to Pearl's. He was a 100 percent classic statistician, motivated by looking at medical studies.
I think another important part of Pearl's journey was that during his transition from Bayesian networks to causal inference, he was very frustrated with the correlational turn in early 1900s statistics. Because causality is so philosophically fraught and often intractable, statisticians shifted to regressions and other acausal models. Pearl sees that as throwing out the baby (important causal questions and answers) with the bathwater (messy empirics and a lack of mathematical language for causality, which is why he coined the do operator).
Pearl discusses this at length in The Book of Why, particularly the Chapter 2 sections on "Galton and the Abandoned Quest" and "Pearson: The Wrath of the Zealot." My guess is that Pearl's frustration with statisticians' focus on correlation was immediate upon getting to know the field, but I don't think he's publicly said how his frustration began.
Is Rubin's work actually the same as Pearl's??
Please tell more?
That's not the impression from reading Pearl s causality. If so, seems like a major omission of scholarship
Rubin's framework says basically, suppose all our observations are in a big data table. Now consider the counterfactual observations that didn't happen (i.e. people in the control group getting the treatment) -- these are called "potential outcomes" -- treat those like missing cells in the data table. Then causal inference is just to fill in potential outcomes using missing data imputation techniques, although to be valid these require some assumptions about conditional independence.
Pearl's framework and Rubin's are isomorphic in the sense that any set of causal assumptions in Pearl's framework (a structural causal model, which has a DAG structure), can be translated into a set of causal assumptions in Rubin's framework (a bunch of conditional independence assumptions about potential outcomes), and vice versa. This is touched on somewhat in Ch. 7 of "Causality".
Pearl argues that despite this equivalence, his framework is superior because it's a better tool for thinking. In other words, writing down your assumptions as DAG/SCM is intuitive and can be explained and argued about, while he claims the Rubin model independence assumptions are opaque and hard to understand.
Some reading on this:
https://csss.uw.edu/files/working-papers/2013/wp128.pdf
http://proceedings.mlr.press/v89/malinsky19b/malinsky19b.pdf
https://arxiv.org/pdf/2008.06017.pdf
---
From my experience it pays to learn how to think about causal inference like Pearl (graphs, structural equations), and also how to think about causal inference like Rubin (random variables, missing data). Some insights only arise from a synthesis of those two views.
Pearl is a giant in the field, but it is worth remembering that he's unusual in another way (compared to a typical causal inference researcher) -- he generally doesn't worry about actually analyzing data.
---
By the way, Gauss figured out not only the normal distribution trying to track down Ceres' orbit, he actually developed the least squares method, too! So arguably the entire loss minimization framework in machine learning came about from thinking about celestial bodies.
One confusion I wrote down in advance was “I still don’t quite know how to predict that there will not be a simple mathematical apparatus that explains something. Why the motion of the planets, why the game of chance, why not the color of houses in England or the number of hairs on a man’s head?"
I think the main thing I'd look for is an unusual amount of regularity. This comes in two types:
There doesn't seem to be any particularly obvious regularity to house colours or number of hairs, they just look like your standard-issue messy situations that don't tell you much.
Only in the middle of the 15th century did it become standard to use symmetric cubes
Haha! Those poor people. All of my intuitions about probabilities would have been terribly broken in those times.
This is exactly the question that John Wentworth is trying to answer with his abstraction hypothesis framework. Also related to Jaynes proof that probability of a fair coin coming up heads is 1/2.
As to being able to discern between different theories. Partly you are right that it can be hard during a scientific controversy and it involves a lot of judgement calls. On the other hand, it can be hard for layman to appreciate how 'rigid' good mathematical models are. Newton didn't just observe that apples fall to the ground but he posited a series of elegant laws and was able to calculate very nonobvious results. The entire theory is quite large and intricate - and there are many quantitive tests one can do and that have been done.
One of the key things to figure out is why scientists working in the field can make confident pronouncements like "oh, the Jupiter thing is just light moving slower" or "no, we swear there's going to be a Higgs boson, we just need to build a more powerful particle accelerator" and have them actually working out. It's a mathematical impossibility that the models of the world they use have lots of knobs they could turn and they have just hit the right setting of the knobs by accident many times (even with plenty of wrong turns along the way as well). And so clearly there is implicit knowledge, not at all obvious to the outside who just hears a one-sentence synopsis of this idea without having to attend any symposia on it or read a half dozen research papers about why it makes sense.
I mean, put like that, it seems like it has an obvious answer. And I think the obvious answer is mostly right, though there can be some interesting wrinkles in it.
Internal consistency of the theory, consistency with other known things, retrodiction of known observations, simplicity of the theory (as compared to its explanatory power), revealing and resolving unsatisfactory aspects of alternative theories…
I agree that those kinds of things are probably “not obvious from a one-sentence synopsis” but I don't see why they have to be “implicit” or require reading lots of research papers.
I'm not sure what the right answer is.
I'd be interested to know how many different people around the world came up with explanations and empirically tested them. I don't know whether people "got the answer right first time" or "lots of people threw lots of hypotheses at the walls and these are the ones that stuck".
“no, we swear there’s going to be a Higgs boson, we just need to build a more powerful particle accelerator”
Particle physicists also made other confident predictions about the LHC that are not working out, and they're now asking for a bigger accelerator.
Survivorship bias might be at play, wherein we forget all the confident pronouncements that ended being just plain wrong.
I mean, the other main things to look for were WIMPs and supersymmetry, but almost everyone was cautious about chances of finding those.
https://www.preposterousuniverse.com/blog/2008/08/04/what-will-the-lhc-find/
My current favorite story of scientific discovery is probably the origin of Bose-Einstein statistics.
Before the discovery of Planck's law, there was the problem of ultraviolet catastrophe in applying statistical mechanics to fields: there are many more high frequency modes than low frequency ones, and the equipartition theorem of statistical mechanics predicts that energy should be spread out evenly across all quadratic degrees of freedom. There's ~ one quadratic degree of freedom for each frequency in a free field, so a naive application led to the Rayleigh-Jeans law for blackbody radiation which predicted an infinite energy flux radiated by a blackbody at nonzero temperature.
People waved this off as statistical mechanics not being applicable to this situation. Then, Planck noticed that if energy comes in discrete packets where the size of each packet scales linearly with frequency, this manages to kill the divergence at high frequencies and give reasonable results for the spectral energy density of blackbody radiation. This is now known as Planck's law.
Some years later, when Bose was giving a talk about the ultraviolet catastrophe problem to an audience and explaining why Planck's calculation was actually unjustified under Maxwell-Boltzmann statistics, he made an error in a combinatorial argument and accidentally derived that Planck's argument was justified. He realized later that the "error" he had made was to assume that photons occupying the same energy level were indistinguishable. Since photons are bosons, this is actually correct, but the discovery was actually made through a calculation mistake.
Bose later submitted this paper to an English journal & got rejected, so he got in touch with Einstein and asked him to translate his article to German so that it could be published in a German journal. Einstein agreed, and that's where the name of "Bose-Einstein statistics" comes from.
One rich dude had a whole island and set it up to have lenses on lots of parts of it, and for like a year he’d go around each day and note down the positions of the stars
You can’t just say that without a name or reference! Not that I don’t believe you - I just want to know more!
As a small note, it would be easier to navigate this post if each section had a brief heading.
Re: Feynman's quote: why in the dickens aren't large outstanding problems summarized this way? This seems like a great way to generate angles of attack, in Hamming's sense of the term. It feels intuitively like being able to describe why a given approach from this list wouldn't work would by itself be substantial progress on a given problem.
Hm. I'm reminded of my college class on Complexity Theory, where the professor explained some common strategies that have been widely successful in proving that two complexity classes either are or aren't the same, and then went on to prove that those strategies could not be used to solve P vs NP.
That gave me a whole new appreciation for the difficulty of the problem, and how hard people have worked on it.
My update is further in the direction that Jacob’s post The Copernican Revolution from the Inside argues for, which is that if two different people had different theories at the time, I do not anticipate the disagreement being able to be “clearly resolvable” at all, and do expect for it to involve a great number of judgment calls, in large part dependent on one’s “philosophy” of how to make those calls in this domain.
Related, from Scott Alexander's review of The Structure of Scientific Revolutions:
... there is rarely a single experiment that one paradigm fails and another passes. Rather, there are dozens of experiments. One paradigm does better on some, the other paradigm does better on others, and everyone argues over which ones should or shouldn’t count.
For example, one might try to test the Copernican vs. Ptolemaic worldviews by observing the parallax of the fixed stars over the course of a year. Copernicus predicts it should be visible; Ptolemy predicts it shouldn’t be. It isn’t, which means either the Earth is fixed and unmoving, or the stars are unutterably unimaginably immensely impossibly far away. Nobody expected the stars to be that far away, so advantage Ptolemy. Meanwhile, the Copernicans posit far-off stars in order to save their paradigm. What looked like a test to select one paradigm or the other has turned into a wedge pushing the two paradigms even further apart.
What looks like a decisive victory to one side may look like random noise to another. Did you know weird technologically advanced artifacts are sometimes found encased in rocks that our current understanding of geology says are millions of years old? Creationists have no trouble explaining those – the rocks are much younger, and the artifacts were probably planted by nephilim. Evolutionists have no idea how to explain those, and default to things like “the artifacts are hoaxes” or “the miners were really careless and a screw slipped from their pocket into the rock vein while they were mining”. I’m an evolutionist and I agree the artifacts are probably hoaxes or mistakes, even when there is no particular evidence that they are. Meanwhile, probably creationists say that some fossil or other incompatible with creationism is a hoax or a mistake. But that means the “find something predicted by one paradigm but not the other, and then the failed theory comes crashing down” oversimplification doesn’t work. Find something predicted by one paradigm but not the other, and often the proponents of the disadvantaged paradigm can – and should – just shrug and say “whatever”.
In 1870, flat-earther Samuel Rowbotham performed a series of experiments to show the Earth could not be a globe. In the most famous, he placed several flags miles apart along a perfectly straight canal. Then he looked through a telescope and was able to see all of them in a row, even though the furthest should have been hidden by the Earth’s curvature. Having done so, he concluded the Earth was flat, and the spherical-earth paradigm debunked. Alfred Wallace (more famous for pre-empting Darwin on evolution) took up the challenge, and showed that the bending of light rays by atmospheric refraction explained Rowbotham’s result. It turns out that light rays curve downward at a rate equal to the curvature of the Earth’s surface! Luckily for Wallace, refraction was already a known phenomenon; if not, it would have been the same kind of wedge-between-paradigms as the Copernicans having to change the distance to the fixed stars.
It is all nice and well to say “Sure, it looks like your paradigm is right, but once we adjust for this new idea about the distance to the stars / the refraction of light, the evidence actually supports my paradigm”. But the supporters of old paradigms can do that too! The Ptolemaics are rightly mocked for adding epicycle after epicycle until their system gave the right result. But to a hostile observer, positing refraction effects that exactly counterbalance the curvature of the Earth sure looks like adding epicycles. At some point a new paradigm will win out, and its “epicycles” will look like perfectly reasonable adjustments for reality’s surprising amount of detail. And the old paradigm will lose, and its “epicycles” will look like obvious kludges to cover up that it never really worked. Before that happens…well, good luck.
One confusion I wrote down in advance was “I still don’t quite know how to predict that there will not be a simple mathematical apparatus that explains something. Why the motion of the planets, why the game of chance, why not the color of houses in England or the number of hairs on a man’s head?"
It seems very hard to say a priori that there won't be any interesting new abstract structure discovered by looking at some new domain, especially when Science is young and you don't know the base rate of 'how often do we discover useful new formalisms?'. E.g., Fibonacci numbers and Lucas numbers show up in the distribution of petals for many flowers; hair could have turned out to reveal something similar.
I think the correct process for zeroing in on relatively promising domains is something like:
Planets are a weird domain — there aren't a bunch of things we knew about in the 17th century that were similar to planets, or that formed a continuum between planets and ordinary objects like lanterns and pigeons. In contrast, hairs are a lot like whiskers, feathers, etc.; and house colors are a lot like cave colors, tent colors, etc. So if there are surprising new generalizations to find, they're more likely to crop up by studying planets than by studying hair or house colors.
Similarly, gambling is weird relative to non-probabilistic inference. If you're really into Aristotle and you're trying to model all human reasoning and decision-making using deductive syllogisms, you should be really curious about the domains where people do weird things like 'bet things based on guesswork, with no certainty they're right'. (You might similarly take an interest in dreams, emotions, divine inspiration, self-deception, bullshit, etc.; they won't all be winners, but an occasional winner is sufficient.)
Promoted to curated: I think understanding how our understanding of nature has historically progressed is quite important for understanding how to structure research fields and research methodologies, and this post covers a bunch of datapoints that seemed pretty informative in that space.
Hello Ben, I'm interested in studying this kind of history too. Can you list the books you're reading to study this? I find that when I study the context of discoveries and how they were developing over time helps me understand them better
I primarily watched YouTube a couple of hours a day for 4 days. YouTube has lots of explainers and more, including great little homemade videos with like 500 views.
Plus a very little Wikipedia and this great site.
in lieu of writing nothing instead, informally -
hey, good list! i wonder if you've read much of the recent history of sabermetrics, which to me is the modern equivalent (in that it's a history of bunch of nerds and some people who wanted to be rich who actualized statistical modeling at the frontier of the applied science)?
some places to look (with hope that others might add theirs):
Moneyball (the book, the movie lacks detail but gets some of the spirit)
fivethirtyeight's methodology articles on their various sports/+ models (https://fivethirtyeight.com/features/how-our-raptor-metric-works/
https://fivethirtyeight.com/features/how-fivethirtyeight-2020-primary-model-works/)
probably a bunch of articles from grantland (which is archived but available, but i lack titles off the top of my head)
https://en.wikipedia.org/wiki/Sports_analytics
zvi's sports betting articles
You might be interested in BACON:
https://users.cs.cf.ac.uk/Dave.Marshall/AI2/node152.html
It was an AI system from the 80's which was able to infer physical laws from data observations. It correctly inferred the ideal gas law from pressure/volume/temperature data, and some (all?) of Kepler's laws from ground-based planetary observations.
Going forward, I think discovery in the natural sciences will entirely be about automated searches in equation-space for models that fit datasets generated by real-world systems.
Why does one model work and not the other? Hopefully we'll know, most likely we won't. At any rate, the era of a human genius working these things out with pen and paper is pretty much over (Just consider the amount of combined intellectual power now needed to make incremental improvements. Major scientific papers these days will usually have a dozen+ names from several institutions).
Ultimately, this process will look like pointing a camera at the world in general and using the resulting raw bit stream to induce the fundamental program that runs the Universe.
Going forward, I think discovery in the natural sciences will entirely be about automated searches in equation-space for models that fit datasets generated by real-world systems.
Wow! Sounds like you should be able to exploit this knowledge for a lot of prestige and scientific discovery :)
You might enjoy reading _The Structure of Scientific Revolutions_. #9 is explicitly discussed there. It is often a case when the old incorrect theory has a lot of work in it and many of the anomalies are explained by additional mechanism, e.g. the geocentric theory had a lot of bells and whistles in the end and it was quite precise in some cases. When the heliocentric theory was created, it was actually worse at predicting the movement of celestial bodies because it was too simplistic and was not able to handle various edge cases. Related to your remark about gravity, it took more than 50 years to successfully apply the theory of gravity to predict how Moon will behave.
Math is not physics. I'm not sure what math is. I kind of like Gisin's support of intuitive math. I agree that the next billion digits of pi mean nothing real, also that there should be some constructivist dimension to the infinities in math (e.g. renormalization).
Oh, and statistics is not math, it's physics. You can test the results of statistics against the real world, but math is merely consistent.
Pearl himself says that he has discovered two laws, and once you have them, you can fire him, because the rest is just algebra! And he calls it a calculus of counterfactuals, just like Newton and Bayes and everyone did. Fascinating.
I couldn’t find anything on what problems Pearl was thinking about when he came up with his calculus of counterfactuals. Like, was he personally trying to analyze clinical trials? Was he a mathematician who was friends with people doing large experiments and thought the math was interesting? I want to know what part of the world he was in contact with when developing it.
I don't know much about the history, but the fact that Pearl was a computer scientist must surely have mattered a lot. His causality math essentially treats the laws of physics as being a "symbolic" program, which given some input generates the resulting variables of the world.
I've been thinking about out whether I can discover laws of agency and wield them to prevent AI ruin (perhaps by building an AGI myself in a different paradigm than machine learning).
So far I’ve looked into the history of the discovery of physical laws (gravity in particular) and mathematical laws (probability theory in particular). Here are 12 things I’ve learned or been surprised by.
1.
Data-gathering was a crucial step in discovering both gravity and probability theory. One rich dude had a whole island and set it up to have lenses on lots of parts of it, and for like a year he’d go around each day and note down the positions of the stars. Then this data was worked on by others who turned it into equations of motion.
2.
Relatedly, looking at the celestial bodies was a big deal. It was almost the whole game in gravity, but also a little helpful for probability theory (specifically the normal distribution was developed in part by noting that systematic errors in celestial measuring equipment followed a simple distribution).
It hadn’t struck me before, but putting a ton of geometry problems on the ceiling for the entire civilization led a lot of people to try to answer questions about it. (It makes Eliezer’s choice in That Alien Message apt.) I’m tempted in a munchkin way to find other ways to do this, like to write a math problem on the surface of the moon, or petition Google to put a prediction market on its home page, or something more elegant than those two.
3.
Probability theory was substantially developed around real-world problems! I thought math was all magical and ivory tower, but it was much more grounded than I expected.
After a few small things like accounting and insurance and doing permutations of the alphabet, games of chance (gambling) was what really kicked it off, with Fermat and Pascal trying to figure out the expected value of games (they didn’t phrase it like that, they put it more like “if the game has to stop before it’s concluded, how should the winnings be split between the players?“).
Other people who consulted with gamblers also would write down data about things like how often different winning hands would come up in different games, and discovered simple distributions, then tried to put equations to them. Later it was developed further by people trying to reason about gases and temperatures, and then again in understanding clinical trials or large repeated biological experiments.
Often people discovered more in this combination of “looking directly at nature” and “being the sort of person who was interested in developing a formal calculus to model what was going on”.
4.
Thought experiments about the world were a big deal too! Thomas Bayes did most of his math this way. He had a thought experiment that went something like this: his assistant would throw a ball on a table that Thomas wasn’t looking at. Then his assistant would throw more balls on the table, each time saying whether it ended up to the right or the left of the original ball. He had this sense that each time he was told the next left-or-right, he should be able to give a new probability that the ball was in any particular given region. He used this thought experiment a lot when coming up with Bayes’ theorem.
5.
Lots of people involved were full-time inventors, rich people who did serious study into a lot of different areas, including mathematics. This is a weird class to me. (I don’t know people like this today. And most scientific things are very institutionalized, or failing that, embedded within business.)
Here’s a quote I enjoyed from one of Pascal’s letters to Fermat when they founded the theory of probability. (For context: de Mere was the gambler who asked Pascal for help with a confusion he had.)
6.
In Laplace’s seminal work putting probability theory on a formal footing, he has a historical section at the end praising all the people who did work, how great they were and how beautiful their work was. Then he has one line on Bayes where he calls his work “a little perplexing”.
Also, whenever you feel like you’ve missed out on your glorious youth, note that Thomas Bayes got interested in probability theory in his 50s, and died aged 59. He was not formally trained in math in his youth.
7.
I watched a talk by Pearl about his causal models, and I was struck by the extent to which he had a “philosophy” of counterfactual inference. It had seemed pretty possible to me he would have said “here was a problem, and here is my solution”, but instead he had a lot to say about counterfactuals and how he thought about them conceptually that wasn’t in the math.
It reminds me of my impression that Daniel Kahneman (and Amos Tversky) have strong models of how their minds work, of which the heuristics & biases literature is a legibilized component of, but certainly does not capture the whole thing.
Relatedly, in a lecture by Feynman on seeking new laws, he says that some people say “don’t talk about what you cannot measure”. He says he agrees insofar as your theories need measurable predictions, but he doesn’t agree that people should stop discussing their whole philosophies, as the philosophies seem to help some people come up with good guesses about laws.
I think in the past I could have found myself unable to justify my interest in the philosophy of something as more than a personal interest. Now I have a practical justification, which is that it helps me come up with guesses about how nature works! And my current guess is that many people who were successful at that had unique and well-developed philosophies.
8.
Pearl himself says that he has discovered two laws, and once you have them, you can fire him, because the rest is just algebra! And he calls it a calculus of counterfactuals, just like Newton and Bayes and everyone did. Fascinating.
I couldn’t find anything on what problems Pearl was thinking about when he came up with his calculus of counterfactuals. Like, was he personally trying to analyze clinical trials? Was he a mathematician who was friends with people doing large experiments and thought the math was interesting? I want to know what part of the world he was in contact with when developing it.
9.
I updated against expecting to resolve scientific disagreements at the time when the correct theory is known. Let me explain.
In the discovery of gravity, there were a lot of anomalies that didn’t fit the data. For instance, Jupiter didn’t follow the law: its orbit was a more elongated ellipse when it was further away. Uranus’s orbit would jiggle a bit sometimes. Also there were two stars who didn’t orbit their collective center of gravity, but instead some other point within the ellipse. At this point I would have been like “yeah, nice try, but your theory isn’t fitting the details”.
Want to know what they said at the time? (Spoilers ahead.) For the stars, they said that we were probably just looking at them at a funny angle and that’s why it didn’t work. For Uranus, they said there was an invisible planet that was knocking it off-course. And for Jupiter, they said the light was moving too slowly for the measurements to work out.
To me this seems like an awful lot of complexity cost weighing on a theory. Now it’s no longer just a theory, it’s also a lot of explaining exceptions with unlikely stories. The star-angle one doesn’t even seem testable, it gives me a Scott-Alexander-like sense of this explanation gives me so many degrees of freedom that I can probably explain away loads more anomalies with it.
Anyway… they were all right.
From Uranus’s wobbles, they found Neptune. The stars were indeed rotated at an angle. And they did some experiments and found out that light did have a speed and this explained the Jupiter issue, and opened up a whole new area of inquiry about light.
Very impressive in retrospect, but I feel like I couldn’t have gotten this right at the time.
My update is further in the direction that Jacob’s post The Copernican Revolution from the Inside argues for, which is that if two different people had different theories at the time, I do not anticipate the disagreement being able to be “clearly resolvable” at all, and do expect for it to involve a great number of judgment calls, in large part dependent on one’s “philosophy” of how to make those calls in this domain.
10.
Feynman has a wonderful quote on the art of guessing nature's laws that includes at least two paths not discussed above. That said I don’t understand them, in particular the ways that quantum mechanics was discovered. (I’m tempted to dig into that some.)
I’ve put the full quote in this footnote[1], recommended.
11.
One confusion I wrote down in advance was “I still don’t quite know how to predict that there will not be a simple mathematical apparatus that explains something. Why the motion of the planets, why the game of chance, why not the color of houses in England or the number of hairs on a man’s head?"
Looking back on this, I don’t know whether I got a direct answer, but I now feel that my answer is something like “look for the places where Nature will show herself directly”. Obviously that’s not a very well-specified answer, but I feel like it points to a real distinction.
12.
I also made an advance prediction: “I guess I also make the advance prediction that most of the rest of the [probability] math was developed by people who liked symbol manipulation more than people doing real-world problem solving. But I would be interested to be surprised here.”
This prediction was false! It took both! All the probability math was developed by people who liked using math to reason rigorously about the world, and who were interested in understanding the real world! There were exceptions like Bayes who relied a great deal on thought-experiment, though sort of still “about” the world, not just about symbols.
When I thought of math previously I thought about my math friends in academia, who just sort of entered the abstract world as a starting point and lived in there. (“My professor does work in flat-spherical-manifold-density-vector-spaces, so I’m trying to prove something there too!“) Now I think of people trying to reason about particular parts of the world I live in, and who are trying to make an externalized symbolic calculus that can do that reasoning for them.
Next Step
The natural next step of my investigation is to learn more about how key discoveries in areas like optimization and information theory and game theory were made. How did nature show herself to these discoverers? I have written down a few advance predictions for if I continue seeking this information...
Feynman, on the art of guessing nature’s laws, in his final lecture for BBC's Messenger Lectures:
“Or look at history, you first start out with Newton: he [was] in a situation where he had incomplete knowledge, and he was able to get the laws by putting together ideas which all were relatively close to experiment—there wasn’t a great distance between the observations and the test.”
“Now, the next guy who did something—another man who did something great—was Maxwell, who obtained the laws of electricity and magnetism. But what he did was this, he put together all the laws of electricity due to Faraday and other people that came before him, and he looked at them and he realized that they were mutually inconsistent; they were mathematically inconsistent. In order to straighten it out he had to add one term to an equation.”
“By the way, he did this by inventing a model for himself of idler wheels, and gears, and so on, in space. Then he found what the new law was, and nobody paid much attention, because they didn’t believe in the idler wheels. We don’t believe in the idler wheels today, but the equations that he obtained were correct. So the logic may be wrong, but the answer is all right.”
“In the case of relativity, the discovery of relativity was completely different: there was an accumulation of paradoxes; the known laws gave inconsistent results, and it was a new kind of thinking, a thinking in terms of discussing the possible symmetries of laws. It was especially difficult because it was for the first time realized how long something like Newton’s laws could be right—and still ultimately be wrong—and, second, that ordinary ideas of time and space that seem so instinctive could be wrong.”
“Quantum mechanics was discovered in two independent ways, which is a lesson. There, again, and even more so, an enormous number of paradoxes were discovered experimentally, things that absolutely couldn’t be explained in any way by what was known—not that the knowledge was incomplete, but the knowledge was too complete!: your prediction was, this should happen; it didn’t.
The two different routes were: one, by Schrodinger, who guessed the equations; another, by Heisenberg, who argued that you must analyze what’s measurable. So two different philosophical methods reduced to the same discovery in the end.”
“More recently, the discovery of the laws of this [weak decay] interaction, which are still only partly known, add quite a somewhat different situation: this time it was a case of incomplete knowledge, and only the equation was guessed. The special difficulty this time was that the experiments were all wrong—all the experiments were wrong.”
“Now, how can you guess the right answer when, when you calculate the results it disagrees with the experiment, and you have the courage to say the experiments must be wrong. I’ll explain where the courage comes from in a minute.”
“Now, I’m sure that history does not repeat itself in physics, as you see from this list, and the reason is this: any scheme—like, "Think of symmetry laws," or "Put the equations in mathematical form," or any of these schemes "Guess equations," and so on—are known to everybody now, and they’re tried all the time. So if the place where you get stuck is not that—and you try that right away: we try looking for symmetries; we try all the things that have been tried before, but we’re stuck-so it must be another way next time.
Each time that we get in this log jam of too many problems, it’s because the methods that we’re using are just like the ones we used before. We try all that right away, but the new discovery is going to be made in a completely different way—so history doesn’t help us very much.”