Epictetus comments on Beyond Statistics 101 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (129)
PCA and other dimensionality reduction techniques are great, but there's another very useful technique that most people (even statisticians) are unaware of: dimensional analysis, and in particular, the Buckingham pi theorem. For some reason, this technique is used primarily by engineers in fluid dynamics and heat transfer despite its broad applicability. This is the technique that allows scale models like wind tunnels to work, but it's more useful than just allowing for scaling. I find it very useful to reduce the number of variables when developing models and conducting experiments.
Dimensional analysis recognizes a few basic axioms about models with dimensions and sees what they imply. You can use these to construct new variables from the old variables. The model is usually complete in a smaller number of these new variables. The technique does not tell you which variables are "correct", just how many independent ones are needed. Identifying "correct" variables requires data, domain knowledge, or both. (And sometimes, there's no clear "best" variable; multiple work equivalently well.)
Dimensional analysis does not help with categorical variables, or numbers which are already dimensionless (though by luck, sometimes combinations of dimensionless variables are actually what's "correct"). This is the main restriction that applies. And you can expect at best a reduction in the number of variables of about 3. Dimensional analysis is most useful for physical problems with maybe 3 to 10 variables.
The basic idea is this: Dimensions are some sort of metadata which can tell you something about the structure of the problem. You can always rewrite a dimensional equation, for example, to be dimensionless on both sides. You should notice that some terms become constants when this is done, and that simplifies the equation.
Here's a physical example: Let's say you want to measure the drag on a sphere (units: N). You know this depends on the air speed (units: m/s), viscosity (units: m^2/s), air density (units: kg/m^3), and the diameter of the sphere (units: m). So, you have 5 variables in total. Let's say you want to do a factorial design with 4 levels in each variable, with no replications. You'd have to do 4^4 = 256 experiments. This is clearly too complicated.
What fluid dynamicists have recognized is that you can rewrite the relationship in terms of different variables, and nothing is missing. The Buckingham pi theorem mentioned previously says that we only need 2 dimensionless variables given our 5 dimensional variables. So, instead of the drag force, you use the drag coefficient, and instead of the speed, viscosity, etc., you use the Reynolds number. Now, you only need to do 4 experiments to get the same level of representation.
As it turns out, you can use techniques like PCA on top of dimensional analysis to determine that certain dimensionless parameters are unimportant (there are other ways too). This further simplifies models.
There's a lot more on this topic than what I have covered and mentioned here. I would recommend reading the book Dimensional analysis and the theory of models for more details and the proof of the pi theorem.
(Another advantage of dimensional analysis: If you discover a useful dimensionless variable, you can get it named after yourself.)
In general, if your problem displays any kind of symmetry* you can exploit that to simplify things. I think most people are capable of doing this intuitively when the symmetry is obvious. The Buckingham pi theorem is a great example of a systematic way to find and exploit a symmetry that isn't so obvious.
* By "symmetry" I really mean "invariance under a group of transformations".
This is a great point. Other than fairly easy geometric and time symmetries, do you have any advice or know of any resources which might be helpful towards finding these symmetries?
Here's what I do know: Sometimes you can recognize these symmetries by analyzing a model differential equation. Here's a book on the subject that I haven't read, but might read in the future. My PhD advisor tells me I already know one reliable way to find these symmetries (e.g., like how to find the change of variables used here), so reading this would be a poor use of time in his view. This approach also requires knowing a fair bit more about a phenomena than just which variables it depends on.
The book you linked is the sort of thing I had in mind. The historical motivation for Lie groups was to develop a systematic way to use symmetry to attack differential equations.
Are you familiar with Noether's Theorem? It comes up in some explanations of Buckingham pi, but the point is mostly "if you already know that something is symmetric, then something is conserved."
The most similar thing I can think of, in terms of "resources for finding symmetries," might be related to finding Lyapunov stability functions. It seems there's not too much in the way of automated function-finding for arbitrary systems; I've seen at least one automated approach for systems with polynomial dynamics, though.
Noether's theorem has nothing to do with Buckingham's theorem. Buckingham's theorem is quite general (and vacuous), while Noether's theorem is only about hamiltonian/lagrangian mechanics.
Added: Actually, Buckingham and Noether do have something in common: they both taught at Bryn Mawr.
Both of them are relevant to the project of exploiting symmetry, and deal with solidifying a mostly understood situation. (You can't apply Buckingham's theorem unless you know all the relevant pieces.) The more practical piece that I had in mind is that someone eager to apply Noether's theorem will need to look for symmetries; they may have found techniques for hunting for symmetries that will be useful in general. It might be worth looking into material that teaches it, not because it itself is directly useful, but because the community that knows it may know other useful things.
In what sense do you mean Buckingham's theorem is vacuous?
It's a quite bit more general than Lagrangian mechanics. You can extend it to any functional that takes functions between two manifolds to complex numbers.
Not familiar with Noether's theorem. Seems useful for constructing models, and perhaps determining if something else beyond mass, momentum, and energy is conserved. Is the converse true as well, i.e., does conservation imply that symmetries exist?
I'm also afraid I know nearly nothing about non-linear stability, so I'm not sure what you're referring to, but it sounds interesting. I'll have to read the Wikipedia page. I'd be interested if you know any other good resources for learning this.
I think this is what Lie groups are all about, but that's a bit deeper in group theory than I'm comfortable speaking on.
I learned it the long way by taking classes, and don't recall being particularly impressed by any textbooks. (I can lend you the ones I used.) I remember thinking that reading through Akella's lecture notes was about as good as taking the course, and so if you have the time to devote to it you might be able to get those from him by asking nicely.
Conservation gives a local symmetry but there may not be a global symmetry.
For instance, you can imagine a physical system with no forces at all, so everything is conserved. But there are still some parameters that define the location of the particles. Then the physical system is locally very symmetric, but it may still have some symmetric global structure where the particles are constrained to lie on a surface of nontrivial topology.