Claim: the usual explanation of the Scientific Method is missing some key pieces about how to make science work well in a high-dimensional world (e.g. our world). Updating our picture of science to account for the challenges of dimensionality gives a different model for how to do science and how to recognize high-value research. This post will sketch out that model, and explain what problems it solves.

The Dimensionality Problem

Imagine that we are early scientists, investigating the mechanics of a sled sliding down a slope. What determines how fast the sled goes? Any number of factors could conceivably matter: angle of the hill, weight and shape and material of the sled, blessings or curses laid upon the sled or the hill, the weather, wetness, phase of the moon, latitude and/or longitude and/or altitude, etc. For all the early scientists know, there may be some deep mathematical structure to the world which links the sled’s speed to the astrological motions of stars and planets, or the flaps of the wings of butterflies across the ocean, or vibrations from the feet of foxes running through the woods.

Takeaway: there are literally billions of variables which could influence the speed of a sled on a hill, as far as an early scientist knows.

So, the early scientists try to control as much as they can. They use a standardized sled, with standardized weights, on a flat smooth piece of wood treated in a standardized manner, at a standardized angle. Playing around, they find that they need to carefully control a dozen different variables to get reproducible results. With those dozen pieces carefully kept the same every time… the sled consistently reaches the same speed (within reasonable precision).

At first glance, this does not sound very useful. They had to exercise unrealistic levels of standardization and control over a dozen different variables. Presumably their results will not generalize to real sleds on real hills in the wild.

But stop for a moment to consider the implications of the result. A consistent sled-speed can be achieved while controlling only a dozen variables. Out of literally billions. Planetary motions? Irrelevant, after controlling for those dozen variables. Flaps of butterfly wings on the other side of the ocean? Irrelevant, after controlling for those dozen variables. Vibrations from foxes’ feet? Irrelevant, after controlling for those dozen variables.

The amazing power of achieving a consistent sled-speed is not that other sleds on other hills will reach the same predictable speed. Rather, it’s knowing which variables are needed to predict the sled’s speed. Hopefully, those same variables will be sufficient to determine the speeds of other sleds on other hills - even if some experimentation is required to find the speed for any particular variable-combination.

Determinism

How can we know that all other variables in the universe are irrelevant after controlling for a handful? Couldn’t there always be some other variable which is relevant, no matter what empirical results we see?

The key to answering that question is determinism. If the system’s behavior can be predicted perfectly, then there is no mystery left to explain, no information left which some unknown variable could provide. Mathematically, information theorists use the mutual information  to measure the information which  contains about . If  is deterministic - i.e. we can predict  perfectly - then  is zero no matter what variable  we look at. Or, in terms of correlations: a deterministic variable always has zero correlation with everything else. If we can perfectly predict , then there is no further information to gain about it.

In this case, we’re saying that sled speed is deterministic given some set of variables (sled, weight, surface, angle, etc). So, given those variables, everything else in the universe is irrelevant.

Of course, we can’t always perfectly predict things in the real world. There’s always some noise - certainly at the quantum scale, and usually at larger scales too. So how do we science?

The first thing to note is that “perfect predictability implies zero mutual information” plays well with approximation: approximately perfect predictability implies approximately zero mutual information. If we can predict the sled’s speed to within 1% error, then any other variables in the universe can only influence that remaining 1% error. Similarly, if we can predict the sled’s speed 99% of the time, then any other variables can only matter 1% of the time. And we can combine those: if 99% of the time we can predict the sled’s speed to within 1% error, then any other variables can only influence the 1% error except for the 1% of sled-runs when they might have a larger effect.

More generally, if we can perfectly predict any specific variable, then everything else in the universe is irrelevant to that variable - even if we can’t perfectly predict all aspects of the system’s trajectory. For instance, if we can perfectly predict the first two digits of the sled’s speed (but not the less-significant digits), then we know that nothing else in the universe is relevant to those first two digits (although all sorts of things could influence the less-significant digits).

As a special case of this, we can also handle noise using repeated experiments. If I roll a die, I can’t predict the outcome perfectly, so I can’t rule out influences from all the billions of variables in the universe. But if I roll a die a few thousand times, then I can approximately-perfectly predict the distribution of die-rolls (including the mean, variance, etc). So, even though I don’t know what influences any one particular die roll, I do know that nothing else in the universe is relevant to the overall distribution of repeated rolls (at least to within some small error margin).

Replication

This does still leave one tricky problem: what if we accidentally control some variable? Maybe air pressure influences sled speed, but it never occurred to us to test the sled in a vacuum or high-pressure chamber, so the air pressure was roughly the same for all of our experiments. We are able to deterministically predict sled speed, but only because we accidentally keep air pressure the same every time.

This is a thing which actually does happen! Sometimes we test something in conditions never before tested, and find that the usual rules no longer apply.

Ideally, replication attempts catch this sort of thing. Someone runs the same experiment in a different place and time, a different environment, and hopefully whatever things were accidentally kept constant will vary. (You’d be amazed what varies by location - I once had quite a surprise double-checking the pH of deionized water in Los Angeles.)

Of course, like air pressure, some things may happen to be the same even across replication attempts.

On the other hand, if a variable is accidentally controlled across multiple replication attempts, then it will likely be accidentally controlled outside the lab too. If every lab tests sled-speed at atmospheric pressure, and nobody ever accidentally tries a different air pressure, then that’s probably because sleds are almost always used at atmospheric pressure. When somebody goes to predict a sled’s speed in space, some useful new scientific knowledge will be gained, but until then the results will generally work in practice.

The Scientific Method In A High-Dimensional World

Scenario 1: a biologist hypothesizes that adding hydroxyhypotheticol to their yeast culture will make the cells live longer, and the cell population will grow faster as a result. To test this hypothesis, they prepare one batch of cultures with the compound and one without, then measure the increase in cell density after 24 hours. They statistically compare the final cell density in the two batches to see whether the compound had a significant effect.

This is the prototypical Scientific Method: formulate a hypothesis, test it experimentally. Control group, p-values, all that jazz.

Scenario 2: a biologist observes that some of their clonal yeast cultures flourish, while others grow slowly or die out altogether, despite seemingly-identical preparation. What causes this different behavior? They search for differences, measuring and controlling for everything they can think of: position of the dishes in the incubator, order in which samples were prepared, mutations, phages, age of the initial cell, signalling chemicals in the cultures, combinations of all those… Eventually, they find that using initial cells of the same replicative age eliminates most of the randomness.

This looks less like the prototypical Scientific Method. There’s probably some hypothesis formation and testing steps in the middle, but it’s less about hypothesize-test-iterate, and more about figuring out which variables are relevant.

In a high-dimensional world, effective science looks like scenario 2. This isn’t mutually exclusive with the Scientific-Method-as-taught-in-high-school, there’s still some hypothesizing and testing, but there’s a new piece and a different focus. The main goal is to hunt down sources of randomness, figure out exactly what needs to be controlled in order to get predictable results, and thereby establish which of the billions of variables in the universe are actually relevant.

Based on personal experience and reading lots of papers, this matches my impression of which scientific research offers lots of long-term value in practice. The one-shot black-box hypothesis tests usually aren’t that valuable in the long run, compared to research which hunts down the variables relevant to some previously confusing (a.k.a. unpredictable) phenomenon.

Everything Is Connected To Everything Else (But Not Directly)

What if there is no small set of variables which determines the outcome of our experiment? What if there really are billions of variables, all of which matter?

We sometimes see a claim like this made about biological systems. As the story goes, you can perform all sorts of interventions on a biological system - knock out a gene, add a drug, adjust diet or stimulus, etc - and any such intervention will change the level of most of the tens-of-thousands of proteins or metabolites or signalling molecules in the organism. It won’t necessarily be a large change, but it will be measurable. Everything is connected to everything else; any change impacts everything.

Note that this is not at all incompatible with a small set of variables determining the outcome! The problem of science-in-a-high-dimensional-world is not to enumerate all variables which have any influence. The problem is to find a set of variables which determine the outcome, so that no other variables have any influence after controlling for those.

Suppose sled speed is determined by the sled, slope material, and angle. There may still be billions of other variables in the world which impact the sled, the slope material, and the angle! But none of those billions of variables are relevant after controlling for the sled, slope material, and angle; other variables influence the speed only through those three. Those three variables mediate the influence of all the billions of other variables.

In general, the goal of science in a high dimensional world is to find sets of variables which mediate the influence of all other variables on some outcome.

In some sense, the central empirical finding of All Of Science is that, in practice, we can generally find small sets of variables which mediate the influence of all other variables. Our universe is “local” - things only interact directly with nearby things, and only so many things can be nearby at once. Furthermore, our universe abstracts well: even indirect interactions over long distances can usually be summarized by a small set of variables. Interactions between stars across galactic distances mostly just depend on the total mass of each star, not on all the details of the plasma roiling inside.

Even in biology, every protein interacts with every other protein in the network, but the vast majority of proteins do not interact directly - the graph of biochemical interactions is connected, but extremely sparse. The interesting problem is to figure out the structure of that graph - i.e. which variables interact directly with which other variables. If we pick one particular “outcome” variable, then the question is which variables are its neighbors in the graph - i.e. which variables mediate the influence of all the other variables.

Summary

Let’s put it all together.

In a high-dimensional world like ours, there are billions of variables which could influence an outcome. The great challenge is to figure out which variables are directly relevant - i.e. which variables mediate the influence of everything else. In practice, this looks like finding mediators and hunting down sources of randomness. Once we have a set of control variables which is sufficient to (approximately) determine the outcome, we can (approximately) rule out the relevance of any other variables in the rest of the universe, given the control variables.

A remarkable empirical finding across many scientific fields, at many different scales and levels of abstraction, is that a small set of control variables usually suffices. Most of the universe is not directly relevant to most outcomes most of the time.

Ultimately, this is a picture of “gears-level science”: look for mediation, hunt down sources of randomness, rule out the influence of all the other variables in the universe. This sort of research requires a lot of work compared to one-shot hypothesis tests, but it provides a lot more long-run value: because all the other variables in the universe are irrelevant, we only need to measure/control the control variables each time we want to reuse the model.

New Comment
53 comments, sorted by Click to highlight new comments since:

Our universe is “local” - things only interact directly with nearby things, and only so many things can be nearby at once. 

After reading this sentence, I had a short moment of illumination, that this is actually backwards: perhaps what our brains perceive as locality, is the property of "being influenced by/related to". Perhaps childs brain learns which "pixels" of retina are near each other, by observing they often have correlated colors, and similarly which places in space are nearby because you can move things or itself between them etc. So, whatever high-dimensional structure the real universe would have, we would still evolve to notice which nodes in the graph are connected and declare them "local". This doesn't mean, that the observation from the quoted sentence is a tautology: it wouldn't be true in a universe with much higher connectivity - we're lucky to live in a universe with a low [Treewidth](https://en.wikipedia.org/wiki/Treewidth), and thus can hope to grasp it.

I believe this is exactly correct. Good explanation, too.

I don't know enough about neurology to make a statement on whether this is something human children learn, or whether it comes evolutionarily preprogrammed, so to speak. But in a universe where physics wasn't at least approximately local, I would expect there'd indeed be little point in holding the notion that points in space and time have given "distances" from one another.

The ~300MB of genetic code we have is a very small amount of space to work with if you have to start specifying the function of individual cells.  Whatever unpacking a functioning human from genes involves, it has to include a substantial amount of "figuring it out at runtime".

[-]TLW50

And indeed, we find a lot of techniques  that look an awful lot like something you'd expect to find in the Demoscene.

Zebra stripes aren't directly encoded in the genome. Instead it's more like "make stripes every 400um at X point into development, then allow them to grow with everything else". (With X varying across species.)

(Although I am not a biochemist, so take this with a grain of salt.)

This is an awful lot like the sorts of fake-a-complex-world-by-using-an-rng-and-procedural-generation approaches often found in size-constrained demos.
 

I'm not sure whether it's the standard view in physics, but Sean Carroll has suggested that we should think of locality in space as deriving from entanglement. (With space itself as basically an emergent phenomenon.) And I believe he considers this a driving principle in his quantum gravity work.

https://www.preposterousuniverse.com/blog/2016/07/18/space-emerging-from-quantum-mechanics/

When we zoom out, does the graph take on the geometry of a smooth, flat space with a fixed number of dimensions? (Answer: yes, when we put in the right kind of state to start with.)

I don't understand the article enough to decode what "the right kind of state" means, but this feels like circular explanation. The three-dimentional space can "emerge" from a graph, but only assuming it is the right kind of graph. Okay, so what caused the graph to be exactly the kind of graph that generates a three-dimensional space?

Well, possibly exactly the right kind of graph to be a mostly 3 dimensional space that curves in complicated ways based on the contents of that space as specified by General Releativity.  The GR view of space is considerably less compact and simple than just R3 and making GR fall out of a graph like that with any kind of rigor would be impressive and maybe useful.

I was expecting the central idea of this post to be more similar to/an extension of Everyday Lessons from High-Dimensional Optimization. That in a high-dimensional world, a good scientist can't afford to waste time testing implausible hypotheses. Doing so will get you the right answer eventually, but it is far too slow. In a high-dimensional world, there are just too many variables to tweak. Relevant excerpt from My Wild and Reckless Youth:

The way Traditional Rationality is designed, it would have been acceptable for me to spend thirty years on my silly idea, so long as I succeeded in falsifying it eventually, and was honest with myself about what my theory predicted, and accepted the disproof when it arrived, et cetera. This is enough to let the Ratchet of Science click forward, but it’s a little harsh on the people who waste thirty years of their lives. Traditional Rationality is a walk, not a dance. It’s designed to get you to the truth eventually, and gives you all too much time to smell the flowers along the way.

To what extent is this post making these points?

Great question. This post is completely ignoring those points, and it's really not something which should be ignored.

In the context of this post, the question is: ok, we're trying to hunt down sources of randomness, trying to figure out which of the billions of variables actually matter, but how do we do that? We can't just guess and check all those variables.

[-][anonymous]180

Your description of the second type of science where you repeatedly control variables to isolate one reminds me a lot of debugging a complex program.

Great point! It's a very similar problem, with a very similar solution. We have some complicated system with a large number of lines/variables which could influence the outcome (i.e. the bug), and the main problem is to figure out which lines/variables mediate the influence of everything else. The first step is to reproduce the bug - i.e. hunt down all the sources of "randomness", until we can make the bug happen consistently. After that, the next step is to look for mediation - i.e. find lines/variables which are "in between" our original reproduction-inputs and the bug itself, and which are themselves sufficient to reproduce the problem.

An issue I find with debugging a complex program is that when you write tests (which put inputs into part of the program and then check whether the expected output is produced), your tests can themselves contain bugs, and often do (if they're not trivially simple). That is, your experiments isolating a small set of variables can produce confusing results due to the experimental design, not just unpredictability in what they're trying to test. Eg maybe your way of measuring the slope angle or sled weight is flawed. (Cf assumptions about the speed/straightness of light or a steady-state universe messing up your astronomical observations). As philosophers of science say, all observation is theory-laden.

[-]Ruby100

Curated. Although written a while ago, I love posts discussing how to approach figuring out the world. I'd be interested to see more examples of the process advocated here though. Where are the cases of this working?

Many of the neuroscience discoveries happened via a mix of luck and detective work - which is what I took from the post as a way for progressing science of complex systems.

I detect the ghost of Jaynes in this!

I am not sure exactly why, but this and the optimization post both call to mind the current of thought suggesting we segregate the hypothesis and experimental steps explicitly. I have encountered this in three places:

  • An unfinished textbook on Arxiv (which I cannot now locate to my frustration) that described treating machine learning as a science, which proposed gathering data and then the goodness of machine learning algorithm is measured by compression.
  • The Report likelihoods, not p-values article on Arbital.
  • This is basically how astronomy works by default: no one has a hypothesis for how pulsars interact and then gets a grant from their university department to launch a satellite network to look for pulsars; instead they identify phenomena on which they have little data, and pool resources to build a telescope or satellite or underground neutrino detector to gather the data, and then the publications test their hypotheses against the data gathered from one or more such projects.

I have a vague intuition that dividing up scientific practice in this way chunks the dimensions more tractably, or at least allows for it. Allowing optimization of data gathering and hypothesis formulation independently seems like a clear win for similar reasons.

Maybe the appeal is that it allows hypotheses to come from multiple directions in dimension space. The dimensionality of a body of data is fixed, but if it is generated as a tuple with a single hypothesis then it can only be approached from the perspective of that single hypothesis; if it is independent, then any hypothesis concerned with any of the dimensions of the data can be applied. By analogy, consider convergent evolution: two different paths in phase space arrive at essentially the same thing. Segregating the data step radically compresses this by allowing hypotheses from any other chain of development to be tested against it directly.

I detect the ghost of Jaynes in this!

In particular, the view in this post is extremely similar to the view in Macroscopic Prediction. As there, reproducible phenomena are the key puzzle piece.

This reminded me of the story of how we got unleaded gasoline as described in a recent RadioLab episode titled, “Heavy Metal.” https://www.wnycstudios.org/podcasts/radiolab/articles/heavy-metal

It starts with this guy Clair Patterson in a lab trying to figure out the age of the earth by dating a meteorite. The first thing he has to do is calculate the amount of lead in the meteorite — except his readings are way off. He realizes he has to control for that variable by determining the amount of lead in the lab.

———————

from the episode transcript:

LYDIA DENWORTH: So first he started with glass beakers he was using.

AVIR: The vials are the first thing you're gonna look at. So he tests the vials. And he goes, "Shit."

[ARCHIVE CLIP, Clair Patterson: Lead.]

AVIR: These glass vials are made with lead, so let's get some new vials.

LATIF: Right.

AVIR: So gets new glass vials. Special order, never made with lead. Runs a sample again. It's still off.

LATIF: Huh.

AVIR: And so then he's like, "You know what? In the sample where I put the granite, I also put some water in it." And he realizes, actually, the water is coming from lead pipes.

LATIF: Ah.

AVIR: And so he's like, "Oh, crap. That's the problem." So he has to triple distill the water, boil it off, make sure he catches it in a vial that has no lead in it, to make sure that his water doesn't have any contamination from the pipes it came through. So he runs the sample and it's a little better, but there's still lead there. So now he's, like, obsessed. And Patterson, he's working in this lab.

LYDIA DENWORTH: And it was pretty grubby.

AVIR: He looks at the walls and he's like ...

LYDIA DENWORTH: There is peeling paint.

AVIR: So he tests the paint.

LYDIA DENWORTH: It was in the paint.

AVIR: So they repaint the walls. But still ...

LYDIA DENWORTH: There was way too much lead.

AVIR: Then he looks at his desk where the mass spectrometer is sitting on, and he figures out every joint in the desk is soldered together with lead.

LATIF: Oh, man.

AVIR: So he needs a new desk, new chairs with no lead. And then he uses Saran Wrap to cover every desk and every chair and every object in the room.

LATIF: [laughs].

AVIR: And still, too much lead.

LATIF: Wow!

AVIR: And so he thinks maybe there's some lead in the dust on the floors. So he starts mopping the floors. He gets the lead numbers to come down a little bit. And then one day, he notices a co-worker's lipstick is messing up his samples.

LATIF: Hmm.

AVIR: So he tests the makeup and he's like, "Okay, there's lead in there too.

LATIF: Wow.

AVIR: We can't wear makeup in this lab. And he eventually—he starts to get the lead number lower and lower. But then one day, he's working in the lab, and a little piece of his hair falls onto the desk, and the lead numbers shoot up.

LYDIA DENWORTH: He said, you know, "Holy shit!"

[ARCHIVE CLIP, Clair Patterson: Your hair!]

LYDIA DENWORTH: It's on him.

LATIF: [gasps] Wow! He's the contamination himself!

[ARCHIVE CLIP, Clair Patterson: The lead from your hair will contaminate the whole damn laboratory. Just from your hair. [laughs]]

AVIR: And so he shaves his head.

LATIF: [laughs]

AVIR: But then one day he decides, "Okay, well I'm just gonna test my skin." And he ends up seeing that there's a bunch of lead in his skin.

LATIF: Oh, no!

AVIR: It's everywhere.

LYDIA DENWORTH: There was lead in absolutely everything. And in the end, he made people—they had a little anteroom and you had to—you literally had to strip down to your underwear and put on this Tyvek suit.

AVIR: Which gets washed in acid.

LYDIA DENWORTH: And have little booties on and put plastic over their hair.

AVIR: He builds positive pressure air vents so the air is constantly blowing and pushing anything inside the lab outside of the lab. So even if you walk in with a little microgram of lead, the air may push it out.

LATIF: Hmm.

AVIR: He basically invents what we now call a "clean lab."

LATIF: Hmm.

AVIR: But he ultimately gets his samples down, his blank samples down to 0.1 micrograms. So that's one tenth of one millionth of a gram.

LATIF: Oof!

AVIR: And that took years.

——————

So this is something like: repeated experiments to isolate the variables that matter in one context only to make a scientific discovery in another context.

This post gave me an idea about how you might approach magic in fiction while keeping it ground in reality: something like magic users are people who learn to pick out relevant variables from the noise to consistently nudge reality in ways that otherwise seem not possible.

Basically placebomancy from Unsong.

I've wanted for a while to see a game along these lines. It would have some sort of 1-v-1 fighting, but dominated by "random" behavior from environmental features and/or unaligned combatants. The centerpiece of the game would be experimenting with the "random" components to figure out how they work, in order to later leverage them in a fight.

There are a lot of video games where some of the rules are never taught to you, although I think the overwhelming majority of examples are due to laziness and/or attempts to disguise the game's shortcomings, rather than attempts to gamify scientific inquiry.

I've typically just found it annoying when the rules of a game aren't communicated.  For instance, how in Civilization 6 the construction costs of certain things change over time according to undocumented rules, or how in As Far as the Eye there's a hidden "harmony" variable that worsens random events if you do things like fully exhaust a resource deposit (but the game never tells you this).

I think this is partly "I was looking for a strategy game, not a research project" 

and partly "they didn't do a very good job of designing a fun research project, probably because they weren't even thinking of it in those terms and so they never actually tried"

and partly "science is inherently effortful and luck-driven, which makes it harder to gamify than things that are less so"

I think you could definitely make a "do your own experiments" game that's better than the examples that have annoyed me in the past, although I'm uncertain whether it's possible to do them well enough to be competitive with other styles of game.  I'm dubious that a 1-v-1 fighting game would be a good fit for this.

 

This also reminds me of an article I read a long time ago (and can't find now) about deliberately-secret rules in video games.  It talked about one game where enemies turn into different colored gems depending on their horizontal position on the screen when you defeat them, and this is important for getting the right gems, but the game never explains it and the player needs to notice it.  Also something about a secret ending in Bubble Bobble that's only available if one of the players has never died, I think?

I noticed that all the examples seemed to be old arcade games, which is perhaps a sign such mechanics weren't popular.

The best compliment I can give this post is that the core idea seems so obviously true that it seems impossible that I haven't thought of or read it before. And yet, I don't think I have.

Aside from the core idea that it's scientifically useful to determine the short list of variables that fully determine or mediate an effect, the secondary claim is that this is the main type of science that is useful and the "hypothesis rejection" paradigm is a distraction. This is repeated a few times but not really proven, and it's not hard to think of counterexamples: most medical research tries to find out whether a single molecule or intervention will ameliorate a single condition. While we know that most medical conditions depend on a whole slew of biological, environmental, and behavioral factors that's not the most relevant thing for treatment. I don't think this a huge weakness of the post, but certainly a direction to follow up on.

Finally, the post is clear and well written. I'm not entirely sure what the purpose of the digression about mutual information was, but everything else was concise and very readable.

There's a parallel need to review the actual purpose for which you are doing all of that. It can be mutable.

For example, suppose you culture some unicellular algae, and you notice the cells can be more or less rounded in the same dish. You shrug and discard the dishes with too elongated cells to keep the line pure and strong. You learn what parameters to keep constant to make it easier.

And then someone shows that in point of fact, cell shape for this group of species can vary somewhat even in culture so we have been wrong about the diversity in the wild this whole time. And you read it and hope in your heart that some very motivated people might one day deviate from the beaten path and finally find out what's going on there, despite this looking entirely unfundable.

Ultimately, this is a picture of “gears-level science”: look for mediation, hunt down sources of randomness, rule out the influence of all the other variables in the universe. 

I'm very doubtful that hunting down sources of randomness is a good way to go about doing science where there's a big solution space. 

There's a lot of human pattern matching involved in coming up with good hypothesis to test.

I think you're pointing to the same issue which Adam Zerner was pointing to. Hunting down sources of randomness is a good goal when doing science, but that doesn't tell us much about how to go about the hunt when the solution space is very large.

It sort of feels like switching the perspectives back and forth between searching for what works at all and searching for things to rule out is analogous to research and development. Iterating between them feels like how knowledge would be refined.

Also: imagining science as "optimizing from zero" is aesthetically pleasing to me.

Fleshing this out a bit more, within the framework of this comment: when we can consistently predict some outcomes using only a handful of variables, we've learned a (low-dimensional) constraint on the behavior of the world. For instance, the gas law PV = nRT is a constraint on the relationship between variables in a low-dimensional summary of a high-dimensional gas. (More precisely, it's a template for generating low-dimensional constraints on the summary variables of many different high-dimensional gases.)

When we flip perspective to problems of design (e.g. engineering), those constraints provide the structure of our problem - analogous to the walls in a maze. We look for "paths in the maze" - i.e. designs - which satisfy the constraints. Duality says that those designs act as constraints when searching for new constraints (i.e. doing science). If engineers build some gadget that works, then that lets us rule out some constraints: any constraints which would prevent the gadget from working must be wrong.

Data serves a similar role (echoing your comment here). If we observe some behavior, then that provides a constraint when searching for new constraints. Data and working gadgets live "in the same space" - the space of "paths": things which definitely do work in the world and therefore cannot be ruled out by constraints.

You know, I had never explicitly considered that data and devices would be in the same abstract space, but as soon as I read the words it was obvious. Thank you for that!

Building devices is actually like setting up physical experiments. If they do what they're supposed to do, you can increase your confidence in the mechanisms that explain how they work.

In the realm of biology, I think hunting for patterns and especially those you care about is a better way then hunting for randomness. 

Many times randomness is the result of complex interactions that can't easily be reduced. 

As a special case of this, we can also handle noise using repeated experiments. If I roll a die, I can’t predict the outcome perfectly, so I can’t rule out influences from all the billions of variables in the universe. But if I roll a die a few thousand times, then I can approximately-perfectly predict the distribution of die-rolls (including the mean, variance, etc). So, even though I don’t know what influences any one particular die roll, I do know that nothing else in the universe is relevant to the overall distribution of repeated rolls (at least to within some small error margin).

I'm not sure I fully understand this, so I wanna try to sketch out an example to see.

Suppose you've got a family of unknown variables , ... which each influence the observable variables , .... Given some observations for some of the s, you can learn some summary statistics  that you can use to predict others s.

I think the counterintuitive thing about this view then, is that  is not independent of  given . So what have we really learned? It doesn't immediately tell us anything about the / relationship. So where's the science?

I think my answer to this question is that while  doesn't tell us anything about the s, it does tell us things about the s. (And  would essentially be a measure of the common causes underlying the s, I suppose.) Which is useful if you care about things that are downstream from the s. But I don't really see what determinism buys you here.

My model of you says that you'd mention something about the KPD theorems. But I don't know what.

Or should this more be understood in a nested sense?

That is, if you've got , ... then you can form , ..., and if  is then deterministically predictable from , you know you're onto something?

I think this post would be stronger if it covered at least basic metrology and statistics. 

It's incorrect to say that billions of variables aren't affecting a sled sliding down a hill - of course they're affecting the speed, even if most are only by a few planck-lengths per hour. But, crucially, they're mostly not affecting it to a detectable amount. The detectability threshold is the key to the argument. 

For detectability, whether you notice the effects of outside variables is going to come down to the precision of the instrument that you're using to measure your output. If you're using a radar gun that gives readings to the nearest MPH, for example, you won't perceive a difference between 10.1 and 10.2 MPH, and so to you the two are equivalent. Nonetheless, outside variables have absolutely influenced the two readings differently. 

Equally critical is the number of measurements that you're taking. For example, if you're taking repeated measurements after controlling a certain set of variables, you may be able to say with a certain confidence/reliability that no other variables are causing enough variations in speed to register an output that's outside of the parameters that you've set. But that is a very different thing than saying that those other variables simply don't exist! One is a statement of probability, another is a statement of certainty. Maybe there's a confluence of variables that only occur once every thousand times, which you won't pick up when doing an initial evaluation. 

[-]TLW20

If you're using a radar gun that gives readings to the nearest MPH, for example, you won't perceive a difference between 10.1 and 10.2 MPH, and so to you the two are equivalent.

As an aside, this is one of the reasons why some sensing systems deliberately inject random noise.

If it turns out that, for instance, your system's actual states are always X.4 MPH, you have a systematic bias if you use a radar gun that actually gives readings to the nearest MPH. If, however, you inject  MPH random noise, you don't have a systematic bias any more. (Of course, this requires repeated sampling to pick up on.)

But that is a very different thing than saying that those other variables simply don't exist! One is a statement of probability, another is a statement of certainty. Maybe there's a confluence of variables that only occur once every thousand times, which you won't pick up when doing an initial evaluation. 

As an extreme example of that, consider:



Under blackbox testing, this function is indistinguishable from .

It's incorrect to say that billions of variables aren't affecting a sled sliding down a hill - of course they're affecting the speed, even if most are only by a few planck-lengths per hour. But, crucially, they're mostly not affecting it to a detectable amount. The detectability threshold is the key to the argument. 

It is important to differentiate between billions of: uncorrelated variables, correlated variables, and anticorrelated variables. A grain of sand on the hill may not detectably influence the sled. A truckload of sand, on the other hand, will very likely do so.

You are correct in the case of uncorrelated variables with a mean of zero; it is interesting in the real world that almost all variables appear to fall into this category.

A remarkable empirical finding across many scientific fields, at many different scales and levels of abstraction, is that a small set of control variables usually suffices.

I'm skeptical that this is true for most things we care about. It's true in the scientific fields where we have the most accurate models, such as physics, but that's likely because there are so few relevant variables in those fields.

Most new drugs that go into clinical trials fail. Essentially, a pharmaceutical company identifies a variable that appears to be the mediator of a medical outcome, they create a drug that tweaks that variable, and then it turns out not to produce the outcome that they thought it would. There are too many other relevant variables that are poorly understood.

The other thing that makes me skeptical is the effectiveness of machine learning models that use a large number of inputs. It's possible that there's a simple underlying structure to what they're predicting that we just haven't figured out yet, but based on what exists now, it sure looks like there are a large number of relevant variables.

Most new drugs that go into clinical trials fail. Essentially, a pharmaceutical company identifies a variable that appears to be the mediator of a medical outcome, they create a drug that tweaks that variable, and then it turns out not to produce the outcome that they thought it would. There are too many other relevant variables that are poorly understood.

I love this example in particular, because as I understand it, this is exactly what pharma companies do not do. What they actually do is target some variable which is correlated with the medical outcome, but is often not causal and is rarely a mediator.

Case in point: amyloid beta plaques in Alzheimers.

Decades ago, people noticed that if you look at the brains of old people with dementia, they usually have lots of plaques, and these plaques are made of a particular protein fragment called amyloid beta. Therefore clearly amyloid beta causes dementia. Pretty soon people were using amyloid beta plaques to diagnose dementia, which made it really easy to show that the plaques cause dementia: when the plaques are how we diagnose “dementia”, then by golly removing the plaques makes the “dementia” (as diagnosed by plaques) go away.

As far as I can tell, there has never at any point in time been compelling evidence that amyloid beta plaques cause age-related memory problems. Conversely, I have seen at least a few studies suggesting the plaques are not causal.

Meanwhile, according to wikipedia, 244 Alzheimer’s drugs were tested in clinical trials from 2002-2012, mostly targeting the amyloid plaques. Of those, only 1 drug made it through.

I think someone familiar with both causality/mediation and the Alzheimers literature could probably have told you in 2000 that those trials were unlikely to pass. But it turns out correct reasoning about causality/mediation is remarkably rare; remember that Pearl & co's work is still very recent by academic standards, and most people in the sciences still don't know about it. Pharma execs don't have the technical skills for it. Some scientists do this sort of reasoning intuitively, but saying "no" to lots of stupid drug tests is not the sort of thing which makes one a "team player" at a big pharma company. (And besides, if the problem is hard enough, you can probably get more drugs to market by throwing lots of shit at the wall and hoping one passes by random chance; I wouldn't put my money on that one drug which passed out of 244 actually being very effective.)

I don't quite think you've solved the problem of induction.

I think there's a fairly serious issue with your claim that being able to predict something accurately means you necessarily fully understand the variables which causes it because determinism.

The first thing to note is that “perfect predictability implies zero mutual information” plays well with approximation: approximately perfect predictability implies approximately zero mutual information. If we can predict the sled’s speed to within 1% error, then any other variables in the universe can only influence that remaining 1% error. Similarly, if we can predict the sled’s speed 99% of the time, then any other variables can only matter 1% of the time. And we can combine those: if 99% of the time we can predict the sled’s speed to within 1% error, then any other variables can only influence the 1% error except for the 1% of sled-runs when they might have a larger effect.

That's not really the cases. E.g: let's say that ice cream melt twice as fast in galaxies without a supermassive black hole at the center. You do experiments to see how fast ice cream melts. After controlling for type of ice cream, temperature, initial temp of the ice cream, airflow and air humidity, you find that you can predict how ice cream melts. You triumphantly claim that you know which things cause ice cream to melt at different rates, having completely missed the black hole's effects.

Essentially, controlling for A & B but not C won't tell you whether C has a causal influence on the thing you're measuring unless

  • you intentionally change C between experiments (not practical given googleplexes of potential causal factors)
  • C happens to naturally vary quite a bit and so makes your experimental results different, cluing you in to the fact that you're missing something.

This seems like the topic of "accidentally controlling a variable" that the post discusses in the section titled "Replication".

Absolutely a problem that happens.

I am kind of suprised you didn't reference causal inference here to just gesture at the task in which we "figure out which variables are directly relevant - i.e. which variables mediate the influence of everything else". Are you pointing to a different sort of idea/do you not feel causal inference is adequate for describing this task?

Also, scenario 1 and 2 seem fairly close to the "linear" and "non-linear" models of innovation Jason Crawford described in his talk "The Non-Linear Model of Innovation." To be honest, I prefered his description of the models. Though he didn't cover how miraculous it is that somehow the model can work. That, to a good approximation, the universe is simple and local.

Causal inference (or more precisely learning causal structure) is exactly the sort of thing I have in mind here. There's actually a few places in the post where I should distinguish between variables which control an outcome in an information sense (i.e. sufficient to perfectly predict the outcome) vs in a causal sense (i.e. sufficient to cause the outcome under interventions). The main reason I didn't talk about it directly is because I would have had to explain that distinction, and decided that would be too much of a distraction from the main point.

I think the takeaway of Jason's talk, as it relates to this post, is that a large chunk of the "science" of achieving consistent outcomes happens in inventors' workshops rather than scientists' labs. The problem is still largely similar, regardless of the label applied, but scientists aren't the only ones doing science.

In 26 models taken from volumes 21 to 25 of the journal Law and Human Behavior, the highest R-squared -proportion of VARIANCE, not variation, explained was  40% and the  second highest 24%

The great challenge is to figure out which variables are directly relevant - i.e. which variables mediate the influence of everything else.

Is this equivalent to identifying the Markov blanket of the phenomenon being studied?

[-]TLW30

On the other hand, if a variable is accidentally controlled across multiple replication attempts, then it will likely be accidentally controlled outside the lab too.

And this is how we missed that nitrocellulose isn't actually explosive in a vacuum.

This does still leave one tricky problem: what if we accidentally control some variable?

Another issue is of anti-correlated variables. (Which, if one were being pedantic, could be treated as introducing another variable and accidentally controlling that.)

If I have an event E that requires both A and B, and in my experimental setup A and B cannot occur at the same time, I am going to conclude that E doesn't happen with high probability. I am not controlling for A accidentally - I do try A - and I am not controlling for B accidentally - I do try B.

This is in some ways worse than the bare "accidentally controlling some variable" because the number of possible combinations grows exponentially with dimension.  is bad enough. ... good luck. Even the bare handshake-pairing  is terrible.

Another way to frame this is you need something beyond brute force systematic hypothesis generation, something that calls on informed intuition and non-rational ways of knowing. Unfortunately, it seems it’s not a problem of just raising awareness that people should do science better, it’s a more fundamental problem of population-level adult development levels needing to be raised more generally. For example, see Chapman’s https://metarationality.com/ or even just https://metarationality.com/stem-fluidity-bridge .

I do not think that the prototypical scientific method is not valuable in the long term.

In any experiment, there are lots of naturally varying parameters (current phase of the Moon, air pressure, amount of snow on the slope), and there are lots of naturally constant parameters (strength of gravity, room temperature, amount of hydroxyhypotethicol in the solution). There are base and derived parameters. The distances from the sun and the orbital periods vary between the planets, but (distance)^3/(orbital period)^2 is constant.

In the experiment, you measure X and Y. If X vary, but Y is constant, then they probably have no relation. Suppose that we want to find out that is X related to B or C. We control B to vary, and set C to a constant. If X vary, then it is not connected to C, if X is constant, then it is unrelated to B.

In the second scenario, you try to find the minimal set of base parameters that are related to X (growth rate). After some testing, we found that (growth rate) ~~ (initial age). After we found that connection, we can rule out the uncontrolled varying parameters, but there may be a connection between X and an uncontrolled constant parameter. It is possible that (growth rate) ~~ (initial age) times (1 + (amount of hydroxyhypotethicol)), and the first scenario will test these kinds of connections.

It is not enough to find which parameters won't affect the experiment. It is also important to find out which parameters could affect the experiment.

I think this certainly describes a type of gears level work scientists engage in, but not the only type, nor necessarily the most common one in a given field. There's also model building, for example.

Even once you've figured out which dozen variables you need to control to get a sled to move at the same speed every time, you still can't predict what that speed would be if you set these dozen variables to different values. You've got to figure out Newton's laws of motion and friction before you can do that.

Finding out which variables are relevant to a phenomenon in the first place is usually a required initial step for building a predictive model, but it's not the only step, nor necessarily the hardest one.

Another type of widespread scientific work I can think of is facilitating efficient calculation. Even if you have a deterministic model that you're pretty sure could theoretically predict a class of phenomena perfectly, that doesn't mean you have the computing power necessary to actually use it. 

Lattice Quantum Chromodynamics should theoretically be able to predict all of nuclear physics, but employing it in practice requires coming up with all sorts of ingenuous tricks and effective theories to reduce the computing power required for a given calculation. It's enough to have kept a whole scientific field busy for over fifty years, and we're still not close to actually being able to freely simulate every interaction of nucleons at the quark level from scratch.

Even once you've figured out which dozen variables you need to control to get a sled to move at the same speed every time, you still can't predict what that speed would be if you set these dozen variables to different values. You've got to figure out Newton's laws of motion and friction before you can do that.

Finding out which variables are relevant to a phenomenon in the first place is usually a required initial step for building a predictive model...

Exactly correct.

Part of the implicit argument of the post is that the "figure out the dozen or so relevant variables" is the "hard" step in a big-O sense, when the number of variables in the universe is large. This is for largely similar reasons to those in Everyday Lessons From High-Dimensional Optimization: in low dimensions, brute force-ish methods are tractable. Thus we get things like e.g. tables of reaction rate constants. Before we had the law of mass action, there were too many variables potentially relevant to reaction rates to predict via brute force. But once we have mass action, there are few enough degrees of freedom that we can just try them out and make these tables of reaction constants.

Now, that still leaves the step of going from "temperature and concentrations are the relevant variables" to the law of mass action, but again, that's the sort of thing where brute-force-ish exploration works pretty well. There is an insight step involved there, but it can largely be done by guess-and-check. And even before that insight is found, there's few enough variables involved that "make a giant table" is largely tractable.

Another type of widespread scientific work I can think of is facilitating efficient calculation...

Good example.

To clarify, my point was that at least in my experience, this isn't always the hard step. I can easily see that being the case in a "top-down" field, like a lot of engineering, medicine, parts of material science, biology and similar things. There, my impression is that once you've figured out what a phenomenon is all about, it often really is as simple as fitting some polynomial of your dozen variables to the data.

But in some areas, like fundamental physics, which I'm involved in, building your model isn't that easy or straightforward. For example, we've been looking for a theory of quantum gravity for ages. We know roughly what sort of variables it should involve. We know what data we want it to explain. But still, actually formulating that theory has proven hellishly difficult. We've been on it for over fifty years now and we're still not anywhere close to real success.

[-]TAG10

The key to answering that question is determinism. If the system’s behavior can be predicted perfectly, then there is no mystery left to explain, no information left which some unknown variable could provide

  1. What matters is local determinism. You need to show that behaviour is predictable from factors under your control. If local determinism fails, it is hard to tell whether locality or determinism failed individually.

  2. And showing that a system's behaviour is predictable when N factors are held constant by the experimenter doesn't show that those are the only ones it is conditionally dependent one. Its behaviour might counterfactually depend on factors which the experimenter did not vary and which did not naturally change over the course of the experiment. In general, you can't exclude mysterious extra variables.

Its behaviour might counterfactually depend on factors which the experimenter did not vary and which did not naturally change over the course of the experiment.

Keep reading, the post gets to that.

[-]TAG30

until then the results will generally work in practice.

Doesn't really contradict what I am saying. In theory, I am saying, you can't exclude mysterious extra variables...but in practice that often doesn't matter, as you are saying.