Beyond Statistics 101
Is statistics beyond introductory statistics important for general reasoning?
Ideas such as regression to the mean, that correlation does not imply causation and base rate fallacy are very important for reasoning about the world in general. One gets these from a deep understanding of statistics 101, and the basics of the Bayesian statistical paradigm. Up until one year ago, I was under the impression that more advanced statistics is technical elaboration that doesn't offer major additional insights into thinking about the world in general.
Nothing could be further from the truth: ideas from advanced statistics are essential for reasoning about the world, even on a day-to-day level. In hindsight my prior belief seems very naive – as far as I can tell, my only reason for holding it is that I hadn't heard anyone say otherwise. But I hadn't actually looked advanced statistics to see whether or not my impression was justified :D.
Since then, I've learned some advanced statistics and machine learning, and the ideas that I've learned have radically altered my worldview. The "official" prerequisites for this material are calculus, differential multivariable calculus, and linear algebra. But one doesn't actually need to have detailed knowledge of these to understand ideas from advanced statistics well enough to benefit from them. The problem is pedagogical: I need to figure out how how to communicate them in an accessible way.
Advanced statistics enables one to reach nonobvious conclusions
To give a bird's eye view of the perspective that I've arrived at, in practice, the ideas from "basic" statistics are generally useful primarily for disproving hypotheses. This pushes in the direction of a state of radical agnosticism: the idea that one can't really know anything for sure about lots of important questions. More advanced statistics enables one to become justifiably confident in nonobvious conclusions, often even in the absence of formal evidence coming from the standard scientific practice.
IQ research and PCA as a case study
The work of Spearman and his successors on IQ constitute one of the pinnacles of achievement in the social sciences. But while Spearman's discovery of IQ was a great discovery, it wasn't his greatest discovery. His greatest discovery was a discovery about how to do social science research. He pioneered the use of factor analysis, a close relative of principal component analysis (PCA).
The philosophy of dimensionality reduction
PCA is a dimensionality reduction method. Real world data often has the surprising property of "dimensionality reduction": a small number of latent variables explain a large fraction of the variance in data.
This is related to the effectiveness of Occam's razor: it turns out to be possible to describe a surprisingly large amount of what we see around us in terms of a small number of variables. Only, the variables that explain a lot usually aren't the variables that are immediately visible – instead they're hidden from us, and in order to model reality, we need to discover them, which is the function that PCA serves. The small number of variables that drive a large fraction of variance in data can be thought of as a sort of "backbone" of the data. That enables one to understand the data at a "macro / big picture / structural" level.
This is a very long story that will take a long time to flesh out, and doing so is one of my main goals.
Loading…
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Comments (129)
PCA and other dimensionality reduction techniques are great, but there's another very useful technique that most people (even statisticians) are unaware of: dimensional analysis, and in particular, the Buckingham pi theorem. For some reason, this technique is used primarily by engineers in fluid dynamics and heat transfer despite its broad applicability. This is the technique that allows scale models like wind tunnels to work, but it's more useful than just allowing for scaling. I find it very useful to reduce the number of variables when developing models and conducting experiments.
Dimensional analysis recognizes a few basic axioms about models with dimensions and sees what they imply. You can use these to construct new variables from the old variables. The model is usually complete in a smaller number of these new variables. The technique does not tell you which variables are "correct", just how many independent ones are needed. Identifying "correct" variables requires data, domain knowledge, or both. (And sometimes, there's no clear "best" variable; multiple work equivalently well.)
Dimensional analysis does not help with categorical variables, or numbers which are already dimensionless (though by luck, sometimes combinations of dimensionless variables are actually what's "correct"). This is the main restriction that applies. And you can expect at best a reduction in the number of variables of about 3. Dimensional analysis is most useful for physical problems with maybe 3 to 10 variables.
The basic idea is this: Dimensions are some sort of metadata which can tell you something about the structure of the problem. You can always rewrite a dimensional equation, for example, to be dimensionless on both sides. You should notice that some terms become constants when this is done, and that simplifies the equation.
Here's a physical example: Let's say you want to measure the drag on a sphere (units: N). You know this depends on the air speed (units: m/s), viscosity (units: m^2/s), air density (units: kg/m^3), and the diameter of the sphere (units: m). So, you have 5 variables in total. Let's say you want to do a factorial design with 4 levels in each variable, with no replications. You'd have to do 4^4 = 256 experiments. This is clearly too complicated.
What fluid dynamicists have recognized is that you can rewrite the relationship in terms of different variables, and nothing is missing. The Buckingham pi theorem mentioned previously says that we only need 2 dimensionless variables given our 5 dimensional variables. So, instead of the drag force, you use the drag coefficient, and instead of the speed, viscosity, etc., you use the Reynolds number. Now, you only need to do 4 experiments to get the same level of representation.
As it turns out, you can use techniques like PCA on top of dimensional analysis to determine that certain dimensionless parameters are unimportant (there are other ways too). This further simplifies models.
There's a lot more on this topic than what I have covered and mentioned here. I would recommend reading the book Dimensional analysis and the theory of models for more details and the proof of the pi theorem.
(Another advantage of dimensional analysis: If you discover a useful dimensionless variable, you can get it named after yourself.)
I've always been amazed at the power of dimensional analysis. To me the best example is the problem of calculating the period of an oscillating mass on a spring. The relevant values are the spring constant K (kg/s^2) and the mass M (kg), and the period T is in (s). The only way to combine K and M to obtain a value with dimensions of (s) is sqrt(M/K), and that's the correct form of the actual answer - no calculus required!
Actually, there's another parameter, the displacement. It turns out that the spring period does not depend on the displacement, but that's a miracle that is special to springs. Instead, look at the pendulum. The same dimensional analysis gives the square root of the length divided by gravitational acceleration. That's off by a dimensionless constant, 2π. Moreover, even that is only approximately correct. The real answer depends on the displacement in a complicated way.
This is a good point. At best you can figure out that period is proportional to (not equal to) sqrt(M/K) multiplied by some function of other parameters, say, one involving displacement and another characterizing the non-linearity (if K is just the initial slope, as I've seen done before). It's a fortunate coincidence if the other parameters are unimportant. You can not determine based solely on dimensional analysis whether certain parameters are unimportant.
In general, if your problem displays any kind of symmetry* you can exploit that to simplify things. I think most people are capable of doing this intuitively when the symmetry is obvious. The Buckingham pi theorem is a great example of a systematic way to find and exploit a symmetry that isn't so obvious.
* By "symmetry" I really mean "invariance under a group of transformations".
This is a great point. Other than fairly easy geometric and time symmetries, do you have any advice or know of any resources which might be helpful towards finding these symmetries?
Here's what I do know: Sometimes you can recognize these symmetries by analyzing a model differential equation. Here's a book on the subject that I haven't read, but might read in the future. My PhD advisor tells me I already know one reliable way to find these symmetries (e.g., like how to find the change of variables used here), so reading this would be a poor use of time in his view. This approach also requires knowing a fair bit more about a phenomena than just which variables it depends on.
The book you linked is the sort of thing I had in mind. The historical motivation for Lie groups was to develop a systematic way to use symmetry to attack differential equations.
Are you familiar with Noether's Theorem? It comes up in some explanations of Buckingham pi, but the point is mostly "if you already know that something is symmetric, then something is conserved."
The most similar thing I can think of, in terms of "resources for finding symmetries," might be related to finding Lyapunov stability functions. It seems there's not too much in the way of automated function-finding for arbitrary systems; I've seen at least one automated approach for systems with polynomial dynamics, though.
Noether's theorem has nothing to do with Buckingham's theorem. Buckingham's theorem is quite general (and vacuous), while Noether's theorem is only about hamiltonian/lagrangian mechanics.
Added: Actually, Buckingham and Noether do have something in common: they both taught at Bryn Mawr.
Both of them are relevant to the project of exploiting symmetry, and deal with solidifying a mostly understood situation. (You can't apply Buckingham's theorem unless you know all the relevant pieces.) The more practical piece that I had in mind is that someone eager to apply Noether's theorem will need to look for symmetries; they may have found techniques for hunting for symmetries that will be useful in general. It might be worth looking into material that teaches it, not because it itself is directly useful, but because the community that knows it may know other useful things.
It's a quite bit more general than Lagrangian mechanics. You can extend it to any functional that takes functions between two manifolds to complex numbers.
In what sense do you mean Buckingham's theorem is vacuous?
Not familiar with Noether's theorem. Seems useful for constructing models, and perhaps determining if something else beyond mass, momentum, and energy is conserved. Is the converse true as well, i.e., does conservation imply that symmetries exist?
I'm also afraid I know nearly nothing about non-linear stability, so I'm not sure what you're referring to, but it sounds interesting. I'll have to read the Wikipedia page. I'd be interested if you know any other good resources for learning this.
I think this is what Lie groups are all about, but that's a bit deeper in group theory than I'm comfortable speaking on.
I learned it the long way by taking classes, and don't recall being particularly impressed by any textbooks. (I can lend you the ones I used.) I remember thinking that reading through Akella's lecture notes was about as good as taking the course, and so if you have the time to devote to it you might be able to get those from him by asking nicely.
Conservation gives a local symmetry but there may not be a global symmetry.
For instance, you can imagine a physical system with no forces at all, so everything is conserved. But there are still some parameters that define the location of the particles. Then the physical system is locally very symmetric, but it may still have some symmetric global structure where the particles are constrained to lie on a surface of nontrivial topology.
That's because outside of physics (and possibly chemistry) there are enough constants running around that all quantities are effectively dimensionless. I'm having a hard time seeing a situation in say biology where I could propose dimensional analysis with a straight face, to say nothing of softer sciences.
One thing that most scientists in these soft scientists already have a good grasp on, but a lot of laypeople do not, is the idea of appropriately normalizing parameters. For instance dividing something by the mass of the body, or the population of a nation, to do comparisons between individuals/nations of different sizes.
People will often make bad comparisons where they don't normalize properly. But hopefully most people reading this article are not at risk for that.
As I said, dimensional analysis does not help with categorical variables. And when the number of dimensions is low and/or the number of variables is large, dimensional analysis can be useless. I think it's a necessary component of any model builder's toolbox, but not a tool you will use for every problem. Still, I would argue that it's underutilized. When dimensional analysis is useful, it definitely should be used. (For example, despite its obvious applications in physics, I don't think most physics undergrads learn the Buckingham pi theorem. It's usually only taught to engineers learning fluid dynamics and heat transfer.)
Two very common dimensionless parameters are the ratio and fraction. Both certainly appear in biology. Also, the subject of allometry in biology is basically simple dimensional analysis.
I've seen dimensional analysis applied in other soft sciences as well, e.g., political science, psychology, and sociology are a few examples I am aware of. I can't comment much on the utility of its application in these cases, but it's such a simple technique that I think it's worth trying whenever you have data with units.
Speaking more generally, the idea of simplification coming from applying transformations to data has broad applicability. Dimensional analysis is just one example of this.
I don't believe you can obtain an understanding of the idea that "correlation does not imply causation" from even a very deep appreciation of the material in Statistics 101. These courses usually make no attempt to define confounding, comparability etc. If they try to define confounding, they tend to use incoherent criteria based on changes in the estimate. Any understanding is almost certainly going to have to originate from outside of Statistics 101; unless you take a course on causal inference based on directed acyclic graphs it will be very challenging to get beyond memorizing the teacher's password
Agree completely, and I'll also point out that at least for me, a very shallow understanding of the ideas in Causality did much more to help me understand correlation vs. causation, confounding etc. than any amount of work with Statistics 101. And this was enormously practical–I was able to make significantly better financial decisions at Fundation due to understanding concepts like Simpson's Paradox on a system 1 level.
To chime in as well: my own understanding of 'correlation does not imply causation' does not come from the basic statistics courses and articles and tutorials I read. While I knew the saying and the concepts and a little bit about causal graphs, it took years of failed self-experiments and the intensely frustrating experience of seeing correlate after correlate fail randomized experiments before I truly accepted it.
I don't know how helpful, exactly, this has been on a practical level, but at least it's good for me on an epistemic level in that I have since accepted many fewer new beliefs than I would otherwise have.
Me four.
Although you know, there is no reason in principle you couldn't get all that stuff Anders_H is talking about from intro stats, it's just that stats isn't taught as well as it can be.
"impression that more advanced statistics is technical elaboration that doesn't offer major additional insights"
Why did you have this impression?
Sorry for the off-topic, but I see this a lot in LessWrong (as a casual reader). People seem to focus on textual, deep-sounding, wow-inducing expositions, but often dislike the technicalities, getting hands dirty with actually understanding calculations, equations, formulas, details of algorithms etc (calculations that don't tickle those wow-receptors that we all have). As if these were merely some minor additions over the really important big picture view. As I see it this movement seems to try to build up a new backbone of knowledge from scratch. But doing this they repeat the mistakes of the past philosophers. For example going for the "deep", outlook-transforming texts that often give a delusional feeling of "oh now I understand the whole world". It's easy to have wow-moments without actually having understood something new.
So yes, PCA is useful and most statistics and maths and computer science is useful for understanding stuff. But then you swing to the other extreme and say "ideas from advanced statistics are essential for reasoning about the world, even on a day-to-day level". Tell me how exactly you're planning to use PCA day-to-day? I think you may mean you want to use some "insight" that you gained from it. But I'm not sure what that would be. It seems to be a cartoonish distortion that makes it fit into an ideology.
Anyway, mainstream machine learning is very useful. And it's usually much more intricate and complicated than to be able to produce a deep everyday insight out of it. I think the sooner you lose the need for everything to resonate deeply or have a concise insightful summary, the better.
Probably because of the human tendency to overestimate the importance of any knowledge one happens to have and underestimate the importance of any knowledge one doesn't. (Is there a name for this bias?)
https://en.wikipedia.org/wiki/Law_of_the_instrument
I think having the concept of PCAs prevents some mistakes in reasoning on an intuitive day to day level of reasoning. It nudges me towards fox thinking instead of hedgehog thinking. Normal folk intuition grasps at the most cognitively available and obvious variable to explain causes, and then our System 1 acts as if that variable explains most if not all the variance. Looking at PCAs many times (and being surprised by them) makes me less likely to jump to conclusions about the causal structure of clusters of related events. So maybe I could characterize it as giving a System 1 intuition for not making the post hoc ergo propter hoc fallacy.
Maybe part of the problem Jonah is running in to explaining it is that having done many many example problems with System 2 loaded it into his System 1, and the System 1 knowledge is what he really wants to communicate?
What do you mean by getting surprised by PCAs? Say you have some data, you compute the principal components (eigenvectors of the covariance matrix) and the corresponding eigenvalues. Were you surprised that a few principal components were enough to explain a large percentage of the variance of the data? Or were you surprised about what those vectors were?
I think this is not really PCA or even dimensionality reduction specific. It's simply the idea of latent variables. You could gain the same intuition from studying probabilistic graphical models, for example generative models.
Surprised by either. Just finding a structure of causality that was very unexpected. I agree the intuition could be built from other sources.
PCA doesn't tell much about causality though. It just gives you a "natural" coordinate system where the variables are not linearly correlated.
Right, one needs to use additional information to determine causality.
Yes, you seem to have a very clear understanding of where I'm coming from. Thanks.
Groupthink I guess: other people who I knew didn't think that it's so important (despite being people who are very well educated by conventional standards, top ~1% of elite colleges).
Disclaimer: I know that I'm not giving enough evidence to convince you: I've thought about this for thousands of hours (including working through many quantitative examples) and it's taking me a long time to figure out how to organize what I've learned.
I already have been using dimensionality reduction (qualitatively) in my day to day life, and I've found that it's greatly improved my interpersonal relationships because it's made it much easier to guess where people are coming from (before people's social behavior had seemed like a complicated blur because I saw so many variables without having started to correctly identify the latent ones).
You seem to be making overly strong assumptions with insufficient evidence: how would you know whether this was the case, never having met me? ;-)
Interesting - what are some examples of the latent ones?
Qualitative day-to-day dimensionality reduction sounds like woo to me. Not a bit more convincing than quantum woo (Deepak Chopra et al.). Whatever you're doing, it's surely not like doing SVD on a data matrix or eigen-decomposition on the covariance matrix of your observations.
Of course, you can often identify motivations behind people's actions. A lot of psychology is basically trying to uncover these motivations. Basically an intentional interpretation and a theory of mind are examples of dimensionality reduction in some sense. Instead of explaining behavior by reasoning about receptors and neurons, you imagine a conscious agent with beliefs, desires and intentions. You could also link it to data compression (dimensionality reduction is a sort of lossy data compression). But I wouldn't say I'm using advanced data compression algorithms when playing with my dog. It just sounds pretentious and shows a desperate need to signal smartness.
So, what is the evidence that you are consciously doing something similar to PCA in social life? Do you write down variables and numbers, or how can I imagine qualitative dimensionality reduction. How is it different from somebody just getting an opinion intuitively and then justifying it with afterwards?
See Rationality is about pattern recognition, not reasoning.
Your tone is condescending, far outside of politeness norms. In the past I would have uncharitably written this off to you being depraved, but I've realized that I should be making a stronger effort to understand other people's perspectives. So can you help me understand where you're coming from on an emotional level?
You asked about emotional stuff so here is my perspective. I have extremely weird feelings about this whole forum that may affect my writing style. My view is constantly popping back and forth between different views, like in the rabbit-duck gestalt image. On one hand I often see interesting and very good arguments, but on the other hand I see tons of red flags popping up. I feel that I need to maintain extreme mental efforts to stay "sane" here. Maybe I should refrain from commenting. It's a pity because I'm generally very interested in the topics discussed here, but the tone and the underlying ideology is pushing me away. On the other hand I feel an urge to check out the posts despite this effect. I'm not sure what aspect of certain forums have this psychological effect on my thinking, but I've felt it on various reddit communities as well.
That sounds like you engage in binary thinking and don't value shades of grey of uncertainty enough. You feel to need to judge arguments for whether they are true or aren't and don't have mental categories for "might be true, or might not be true".
Jonah makes strong claims for which he doesn't provide evidence. He's clear about the fact that he hasn't provided the necessary evidence.
Given that you pattern match to "crackpot" instead of putting Jonah in the mental category where you don't know whether what Jonah says is right or wrong. If you start to put a lot of claims into the "I don't know"-pile you don't constantly pop between belief and non-belief. Popping back and forth means that the size of your updates when presented new evidence are too large.
Being able to say "I don't know" is part of genuine skepticism.
I'm not talking about back and forth between true and false, but between two explanations. You can have a multimodal probability distribution and two distant modes are about equally probable, and when you update, sometimes one is larger and sometimes the other. Of course one doesn't need to choose a point estimate (maximum a posteriori), the distribution itself should ideally be believed in its entirety. But just as you can't see the rabbit-duck as simultaneously 50% rabbit and 50% duck, one sometimes switches between different explanations, similarly to an MCMC sampling procedure.
I don't want to argue this too much because it's largely a preference of style and culture. I think the discussions are very repetitive and it's an illusion that there is much to be learned by spending so much time thinking meta.
Anyway, I evaporate from the site for now.
Seconded, actually, and it's particular to LessWrong. I know I often joke that posting here gets treated as submitting academic material and skewered accordingly, but that is very much what it feels like from the inside. It feels like confronting a hostile crowd of, as Jonah put it, radical agnostics, every single time one posts, and they're waiting for you to say something so they can jump down your throat about it.
Oh, and then you run into the issue of having radically different priors and beliefs, so that you find yourself on a "rationality" site where someone is suddenly using the term "global warming believer" as though the IPCC never issued multiple reports full of statistical evidence. I mean, sure, I can put some probability on, "It's all a conspiracy and the official scientists are lying", but for me that's in the "nonsense zone" -- I actually take offense to being asked to justify my belief in mainstream science.
As much as "good Bayesians" are never supposed to agree to disagree, I would very much like if people would be up-front about their priors and beliefs, so that we can both decide whether it's worth the energy spent on long threads of trying to convince people of things.
Thanks so much for sharing. I'm astonished by how much more fruitful my relationships have became since I've started asking.
I think that a lot of what you're seeing is a cultural clash: different communities have different blindspots and norms for communication, and a lot of times the combination of (i) blindspots of the communities that one is familiar with and (ii) respects in which a new community actually is unsound can give one the impression "these people are beyond the pale!" when the actual situation is that they're no less rational than members of one's own communities.
I had a very similar experience to your own coming from academia, and wrote a post titled The Importance of Self-Doubt in which I raised the concern that Less Wrong was functioning as a cult. But since then I've realized that a lot of the apparently weird beliefs on LWers are in fact also believed by very credible people: for example, Bill Gates recently expressed serious concern about AI risk.
If you're new to the community, you're probably unfamiliar with my own credentials which should reassure you somewhat:
I did a PhD in pure math under the direction of Nathan Dunfield, who coauthored papers with Bill Thurston, who formulated the geometrization conjecture which Perelman proved and in doing so won one of the Clay Millennium Problems.
I've been deeply involved with math education for highly gifted children for many years. I worked with the person who won the American Math Society prize for best undergraduate research when he was 12.
I worked at GiveWell, which partners with with Good Ventures, Dustin Moskovitz's foundation.
I've done fullstack web development, making an asynchronous clone of StackOverflow (link).
I've done machine learning, rediscovering logistic regression, collaborative filtering, hierarchical modeling, the use of principal component analysis to deal with multicollinearity, and cross validation. (I found the expositions so poor that it was faster for me to work things out on my own than to learn from them, though I eventually learned the official versions).You can read some details of things that I found here. I did a project implementing Bayesian adjustment of Yelp restaurant star ratings using their public dataset here
So I imagine that I'm credible by your standards. There are other people involved in the community who you might find even more credible. For example: (a) Paul Christiano who was an international math olympiad medalist, wrote a 50 page paper on quantum computational complexity with Scott Aaronson as an undergraduate at MIT, and is a theoretical CS grad student at Berkeley. (b) Jacob Steinhardt, a Hertz graduate fellow who does machine learning research under Percy Liang at Stanford.
So you're not actually in some sort of twilight zone. I share some of your concerns with the community, but the groupthink here is no stronger than the groupthink present in academia. I'd be happy to share my impressions of the relative soundness of the various LW community practices and beliefs.
Those are indeed impressive things you did. I agree very much with your post from 2010. But the fact that many people have this initial impression shows that something is wrong. What makes it look like a "twilight zone"? Why don't I feel the same symptoms for example on Scott Alexander's Slate Star Codex blog?
Another thing I could pinpoint is that I don't want to identify as a "rationalist", I don't want to be any -ist. It seems like a tactic to make people identify with a group and swallow "the whole package". (I also don't think people should identify as atheist either.)
Nobody forces you to do so. Plenty of people in this community don't self identify that way.
I'm sympathetic to everything you say.
In my experience there's an issue of Less Wrongers being unusually emotionally damaged (e.g. relative to academics) and this gives rise to a lot of problems in the community. But I don't think that the emotional damage primarily comes from the weird stuff that you see on Less Wrong. What one sees is them having born the brunt of the phenomenon that I described here disproportionately relative to other smart people, often because they're unusually creative and have been marginalized by conformist norms
Quite frankly, I find the norms in academia very creepy: I've seen a lot of people develop serious mental health problems in connection with their experiences in academia. It's hard to see it from the inside: I was disturbed by what I saw, but I didn't realize that math academia is actually functioning as a cult, based on retrospective impressions, and in fact by implicit consensus of the best mathematicians of the world (I can give references if you'd like) .
I'm sure you're aware that the word "cult" is a strong claim that requires a lot of evidence, but I'd also issue a friendly warning that to me at least it immediately set off my "crank" alarm bells. I've seen too many Usenet posters who are sure they have a P=/!=NP proof, or a proof that set theory is false, or etc. who ultimately claim that because "the mathematical elite" are a cult that no one will listen to them. A cult generally engages in active suppression, often defamation, and not simply exclusion. Do you have evidence of legitimate mathematical results or research being hidden/withdrawn from journals or publicly derided, or is it more of an old boy's club that's hard for outsiders to participate in and that plays petty politics to the damage of the science?
Grothendieck's problems look to be political and interpersonal. Perelman's also. I think it's one thing to claim that mathematical institutions are no more rational than any other politicized body, and quite another to claim that it's a cult. Or maybe most social behavior is too cult-like. If so; perhaps don't single out mathematics.
I question the direction of causation. Historically many great mathematicians have been mentally and socially atypical and ended up not making much sense with their later writings. Either mathematics has always had an institutional problem or mathematicians have always had an incidence of mental difficulties (or a combination of both; but I would expect one to dominate).
Especially in Thurston's On Proof and Progress in Mathematics I can appreciate the problem of trying to grok specialized areas of mathematics. The terminology and symbology is opaque to the uninitiated. It reminds me of section 1 of the Metamath Book which expresses similar unhappiness with the state of knowledge between specialist fields of mathematics and the general difficulty of learning mathematics. I had hoped that Metamath would become more popular and tie various subfields together through unifying theories and definitions, but as far as I can tell it languishes as a hobbyist project for a few dedicated mathematicians.
Thanks, yeah, people have been telling me that I need to be more careful in how I frame things. :-)
The latter, but note that that's not necessarily less damaging than active suppression would be.
Yes, this is what I believe. The math community is just unusually salient to me, but I should phrase things more carefully.
Most of the people who I have in mind did have preexisting difficulties. I meant something like "relative to a counterfactual where academia was serving its intended function." People of very high intellectual curiosity sometimes approach academia believing that it will be an oasis and find this not to be at all the case, and that the structures in place are in fact hostile to them.
This is not what the government should be supporting with taxpayer dollars.
What are your own interests?
I would like to see some of those references (simply because I have no relation to Academia, and don't like things I read somewhere to gestate into unfounded intuitions about a subject).
I think you're just projecting.
I would probably use different words, but I believe I fit Jonah's description. Before finding LW, I felt strongly isolated. Like, surrounded by human bodies, but intellectually alone. Thinking about topics that people around me considered "weird", so I had no one to debate them with. Having a large range of interests, and while I could find people to debate individual interests with, I had no one to talk with about the interesting combinations I saw there.
I felt "weird", and from people around me I usually got two kinds of feedback. When I didn't try to pretend anything, they more or less confirmed that I am weird (of course, many were gentle, trying not to hurt me). When I tried to play a role of someone "less weird" (that is, I ignored most of the things I considered interesting, and just tried to fit)... well, it took a lot of time and practice to do this correctly, but then people accepted me. So, for a long time it felt like the only way to be accepted would be to supress a large part of what I consider to be "myself"; and I suspect that it would never work perfectly, that there would still be some kind of intellectual hunger.
Then I found LW and I was like: "whoa... there actually are people like me! too bad they are on the other side of the planet though". Then I found some of them living closer, and... going to meetups feels incredibly refreshing. First time in my life, I don't have to suppress anything, to play any role. I just am... in an environment that feels natural. I finally started understanding how people can enjoy having social contacts.
Now let's imagine that in a parallel universe, those LessWrongers who live in a city near to mine, would instead be my neighbors since my childhood, or that we would be classmates at high school. I believe my life would be very different. (I believe there are people like this in my city, but the problem is finding those few dozen individuals among the hundreds of thousands, especially when there is no word in a public vocabulary to describe "us".)
I can't the article now, but I believe it was written by Lewis Terman, where he observed how successful are highly intelligent people. He found a difference between those who were "intelligent people in an intelligent environment" and those who were "isolated intelligent people". The former were usually very successful in life: they could talk with their parents and friends as equals, share their algorithms for life success, fit into their environment. The latter felt isolated, and often burned out at some moment of their lives. The conclusion was that for a highly intelligent person, having similarly highly intelligent family and friends makes a huge difference in their lives. -- When you observe the difference between "academia" and "LessWrong", it may be related to this.
It is easier to be academically successful when your parents are. You can pick good habits and strategies from them; you can debate your work and problems with them. If you are the only academically inclined person in the family, you lead a double life: the "real life" outside of school, and the "academic life" inside. The more you focus on your work, the more it feels like you are withdrawing from everything else. On the other hand, if you come from the same culture, focusing on the work makes you fit into the culture.
I am going to break a taboo here, but I don't know how to tell it otherwise. I have IQ about four or five sigma above the average. The difference between me and the average Mensa member is larger that the difference between Mensa and the general population. Many people in Mensa seem kind of dense to me, and average people, those are sometimes like five-years old children. (I believe for many people on LessWrong it feels the same.) Sure, intelligence in not everything: other people have skills and traits that I lack, sometimes have more success than me, and I admire that. It's just... so difficult to talk with them like with adult people. But when I go to LW meetup, it's like "whoa... finally a group of adult people, how amazing!".
But I'm already an old man, relatively speaking. Now I am 39; I found LW when I was 35. Finally I have a company of my peers (still not in my own city), but it can't fix the three decades of my life that already passed in isolation. It can make my life better, but I will always have the emotional scars of chronic loneliness. Oh, how much I envy those lucky kids who can go to LW meetups as teenagers. Makes me wonder how much my own life could be different; I probably wouldn't recognize myself.
Of course, this is just one data point; I don't know how typical or atypical I am within the LW community.
I'm speaking based on many interactions with many members of the community. I don't think this is true of everybody, but I have seen a difference at the group level.
I've only been in CS academia, and wouldn't call that a cult. I would call it, like most of the rest of academia, a deeply dysfunctional industry in which to work, but that's the fault of the academic career and funding structure. CS is even relatively healthy by comparison to much of the rest.
How much of our impression of mathematics as a creepy, mental-health-harming cult comes from pure stereotyping?
Jonah happens to be a math phd. How can you engage in pure stereotyping of mathematicians while you get your PHD?
For what its worth, I have observed a certain reverence in the way great mathematicians are treated by their lesser-accomplished colleagues that can often border on the creepy. This is something specific to math, in that it seems to exist in other disciplines with lesser intensity.
But I agree, "dysfunctional" seems to be a more apt label than "cult." May I also add "fashion-prone?"
Er, what? Who do you mean by "we"?
The link says of Turing:
This is a staggeringly wrong account of how he died.
I don't have direct exposure to CS academia, which, as you comment, is known to be healthier :-). I was speaking in broad brushstrokes , I'll qualify my claims and impressions more carefully later.
I don't really understand what you mean about math academia. Those references would be appreciated.
The top 3 answers to the MathOverflow question Which mathematicians have influenced you the most? are Alexander Grothendieck, Mikhail Gromov, and Bill Thurston. Each of these have expressed serious concerns about the community.
Grothendieck was actually effectively excommunicated by the mathematical community and then was pathologized as having gone crazy. See pages 37-40 of David Ruelle's book A Mathematician's Brain.
Gromov expresses strong sympathy for Grigory Perelman having left the mathematical community starting on page 110 of Perfect Rigor. (You can search for "Gromov" in the pdf to see all of his remarks on the subject.)
Thurston made very apt criticisms of the mathematical community in his essay On Proof and Progress In Mathematics. See especially the beginning of Section 3: "How is mathematical understanding communicated?" Terry Tao endorses Thurston's essay in his obituary of Thurston. But the community has essentially ignored Thurston's remarks: one almost never hears people talk about the points that Thurston raises.
I've always thought that calling yourself a "rationalist" or "aspiring rationalist" is rather useless. You're either winning or not winning. Calling yourself by some funny term can give you the nice feeling of belonging to a community, but it doesn't actually make you win more, in itself.
Of course, Christiano tends to issue disclaimers with his MIRI-branded AGI safety work, explicitly stating that he does not believe in alarmist UFAI scenarios. Which is fine, in itself, but it does show how people expect someone associated with these communities to sound.
And Jacob Steinhardt hasn't exactly endorsed any "Twilight Zone" community norms or propaganda views. Errr, is there a term for "things everyone in a group thinks everyone else believes, whether or not they actually do"?
I'm not claiming otherwise: I'm merely saying that Paul and Jacob don't dismiss LWers out of hand as obviously crazy, and have in fact found the community to be worthwhile enough to have participated substantially.
I think in this case we have to taboo the term "LWers" ;-). This community has many pieces in it, and two large parts of the original core are "techno-libertarian Overcoming Bias readers with many very non-mainstream beliefs that they claim are much more rational than anyone else's beliefs" and "the SL4 mailing list wearing suits and trying to act professional enough that they might actually accomplish their Shock Level Four dreams."
On the other hand, in the process of the site's growth, it has eventually come to encompass those two demographics plus, to some limited extent, almost everyone who's willing to assent that science, statistical reasoning, and the neuro/cognitive sciences actually really work and should be taken seriously. With special emphasis on statistical reasoning and cognitive sciences.
So the core demographic consists of Very Unusual People, but the periphery demographics, who now make up most of the community, consist of only Mildly Unusual People.
Yes, this seems like a fair assessment o the situation. Thanks for disentangling the issues. I'll be more precise in the future.
I would be very interested in hearing elaboration on this topic, either publicly or privately.
I prefer public discussions. First, I'm a computer science student who took courses in machine learning, AI, wrote theses in these areas (nothing exceptional), I enjoy books like Thinking Fast and Slow, Black Swan, Pinker, Dawkins, Dennett, Ramachandran etc. So the topics discussed here are also interesting to me. But the atmosphere seems quite closed and turning inwards.
I feel similarities to reddit's Red Pill community. Previously "ignorant" people feel the community has opened a new world to them, they lived in darkness before, but now they found the "Way" ("Bayescraft") and all this stuff is becoming an identity for them.
Sorry if it's offensive, but I feel as if many people had no success in the "real world" matters and invented a fiction where they are the heroes by having joined some great organization much higher above the general public, who are just irrational automata still living in the dark.
I dislike the heavy use of insider terminology that make communication with "outsiders" about these ideas quite hard because you get used to referring to these things by the in-group terms, so you get kind of isolated from your real-life friends as you feel "they won't understand, they'd have to read so much". When actually many of the concepts are not all that new and could be phrased in a way that the "uninitiated" can also get it.
There are too many cross references in posts and it keeps you busy with the site longer than necessary. It seems that people try to prove they know some concept by using the jargon and including links to them. Instead, I'd prefer authors who actively try to minimize the need for links and jargon.
I also find the posts quite redundant. They seem to be reiterations of the same patterns in very long prose with people's stories intertwined with the ideas, instead of striving for clarity and conciseness. Much of it feels a lot like self-help for people with derailed lives who try to engineer their life (back) to success. I may be wrong but I get a depressed vibe from reading the site too long. It may also be because there is no lighthearted humor or in-jokes or "fun" or self-irony at all. Maybe because the members are just like that in general (perhaps due to mental differences, like being on the autism spectrum, I'm not a psychiatrist).
I can see that people here are really smart and the comments are often very reasonable. And it makes me wonder why they'd regard a single person such as Yudkowsky in such high esteem as compared to established book authors or academics or industry people in these areas. I know there has been much discussion about cultishness, and I think it goes a lot deeper than surface issues. LessWrong seems to be quite isolated and distrusting towards the mainstream. Many people seem to have read stuff first from Yudkowsky, who often does not reference earlier works that basically state the same stuff, so people get the impression that all or most of the ideas in "The Sequences" come from him. I was quite disappointed several times when I found the same ideas in mainstream books. The Sequences often depict the whole outside world as dumber than it is (straw man tactics, etc).
Another thing is that discussion is often too meta (or meta-meta). There is discussion on Bayes theorem and math principles but no actual detailed, worked out stuff. Very little actual programming for example. I'd expect people to create github projects, IPython notebooks to show some examples of what they are talking about. Much of the meta-meta-discussion is very opinion-based because there is no immediate feedback about whether someone is wrong or right. It's hard to test such hypotheses. For example, in this post I would have expected an example dataset and showing how PCA can uncover something surprising. Otherwise it's just floating out there although it matches nicely with the pattern that "some math concept gave me insight that refined my rationality". I'm not sure, maybe these "rationality improvements" are sometimes illusions.
I also don't get why the rationality stuff is intermixed with friendly AI and cryonics and transhumanism. I just don't see why these belong that much together. I find them too speculative and detached from the "real world" to be the central ideas. I realize they are important, but their prevalence could also be explained as "escapism" and it promotes the discussion of untestable meta things that I mentioned above, never having to face reality. There is much talk about what evidence is but not much talk that actually presents evidence.
I needed to develop a sort of immunity against topics like acausal trade that I can't fully specify how they are wrong, but they feel wrong and are hard to translate to practical testable statements, and it just messes with my head in the wrong way.
And of course there is also that secrecy around and hiding of "certain things".
That's it. This place may just not be for me, which is fine. People can have their communities in the way they want. You just asked for elaboration.
There's also the whole Lesswrong-is-dying thing that might be contribute to the vibe you're getting. I've been reading the forum for years and it hasn't felt very healthy for a while now. A lot of the impressive people from earlier have moved on, we don't seem to be getting that many new impressive people coming in and hanging out a lot on the forum turns out not to make you that much more impressive. What's left is turning increasingly into a weird sort of cargo cult of a forum for impressive people.
Actually, I think that LessWrong used to be worse when the "impressive people" were posting about cryonics, FAI, many-world interpretation of quantum mechanics, and so on.
It has seemed to me that a lot of the commenters who come with their own solid competency are also less likely to get unquestioningly swept away following EY's particular hobbyhorses.
Thanks for the detailed response! I'll respond to a handful of points:
I certainly agree that there are people here who match that description, but it's also worth pointing out that there are actual experts too.
One of the things I find most charming about LW, compared to places like RationalWiki, is how much emphasis there is on self-improvement and your mistakes, not mistakes made by other people because they're dumb.
I'm not sure this is avoidable, and in full irony I'll link to the wiki page that explains why.
In general, there are lots of concepts that seem useful, but the only way we have to refer to concepts is either to refer to a label or to explain the concept. A number of people read through the sequences and say "but the conclusions are just common sense!", to which the response is, "yes, but how easy is it to communicate common sense?" It's one thing to be able to recognize that there's some vague problem, and another thing to be able to say "the problem here is inferential distance; knowledge takes many steps to explain, and attempts to explain it in fewer steps simply won't work, and the justification for this potentially surprising claim is in Appendix A." It is one thing to be able to recognize a concept as worthwhile; it is another thing to be able to recreate that concept when a need arises.
Now, I agree with you that having different labels to refer to the same concept, or conceptual boundaries or definitions that are drawn slightly differently, is a giant pain. When possible, I try to bring the wider community's terminology to LW, but this requires being in both communities, which limits how much any individual person can do.
Part of that is just seeding effects--if you start a rationality site with a bunch of people interested in transhumanism, the site will remain disproportionately linked to transhumanism because people who aren't transhumanists will be more likely to leave and people who are transhumanists will be more likely to find and join the site.
Part of it is that those are the cluster of ideas that seem weird but 'hold up' under investigation--most of the reasons to believe that the economy of fifty years from now will look like the economy of today are just confused, and if a community has good tools for dissolving confusions you should expect them to converge on the un-confused answer.
A final part seems to be availability; people who are convinced by the case for cryonics tend to be louder than the people who are unconvinced. The annual surveys show the perception of LW one gets from just reading posts (or posts and comments) is skewed from the perception of LW one gets from the survey results.
I agree that LW is much better than RationalWiki, but I still think that the norms for discussion are much too far in the direction of focus on how other commenters are wrong as opposed to how one might oneself be wrong.
I know that there's a selection effect (with respect to the more frustrating interactions standing out). But people not infrequently mistakenly believe that I'm wrong about things that I know much more about than they do, with very high confidence, and in such instances I find the connotations that I'm unsound to be exasperating.
I don't think that this is just a problem for me rather than a problem for the community in general: I know a number of very high quality thinkers in real life who are uninterested in participating on LW explicitly because they don't want to engage with commenters who are highly confident that their own positions are incorrect. There's another selection effect here: such people aren't salient because they're invisible to the online community.
I agree that those frustrating interactions both happen and are frustrating, and that it leads to a general acidification of the discussion as people who don't want to deal with it leave. Reversing that process in a sustainable way is probably the most valuable way to improve LW in the medium term.
The applicable word is metaphysics. Acausal trade is dabbling in metaphysics to "solve" a question in decision theory, which is itself mere philosophizing, and thus one has to wonder: what does Nature care for philosophies?
By the way, for the rest of your post I was going, "OH MY GOD I KNOW YOUR FEELS, MAN!" So it's not as though nobody ever thinks these things. Those of us who do just tend to, in perfect evaporative cooling fashion, go get on with our lives outside this website, being relatively ordinary science nerds.
Sorry avoiding metaphysics doesn't work. You just end up either reinventing them (badly) or using a bad 5th hand version of some old philospher's metaphysics. Incidentally, Eliezer also tried avoiding metaphysics and wound up doing the former.
Its insufficiently appreciated that physicalism is metaphysics too.
I don't like Eliezer's apparent mathematical/computational Platonism myself, but most working scientists manage to avoid metaphysical buggery by simply dealing with only those things with which what they can actually causally interact. I recall an Eliezer post on "Explain/Worship/Ignore", and would add myself that while "Explain" eventually bottoms out in the limits of our current knowledge, the correct response is to hit "Ignore" at that stage, not to drop to one's knees in Worship of a Sacred Mystery that is in fact just a limit to current evidence.
EDIT: This is also one of the reasons I enjoy being in this community: even when I disagree with someone's view (eg: Eliezer's), people here (including him) are often more productive and fun to talk to than someone who hits the limits of their scientific knowledge and just throws their hands up to the tune of "METAPHYSICS, SON!", and then joins the bloody Catholic Church, as if that solved anything.
See my edit. Part of where I'm coming from is realizing how socially undeveloped people's in our reference class are tend to be, such that apparent malice often comes from misunderstandings.
Don't say the p-word, please ;-).
I do agree that more real-life understanding is gained from just obtaining a broad scientific education than from going wow-hunting. But of course, I would say that, since I'm a fanatical textbook purchaser.
What resources would you recommend for learning advanced statistics?
What would you call "advanced" statistics? But let's start listing classes:
1) Intro to Discrete and Continuous Probability -- you'll need this for every possible path
Now we need to start branching out. Choose your adventure: applied or theoretical? Frequentist, Bayesian, Likelihoodist, or "Machine" Learning?
Your normal university statistics sequence will probably give you Intro to Frequentist Statistics 1 at this point. That's a fine way to go, but it's not the only way. In fact, many departments in the empirical sciences will teach Data Analysis classes, or the like, which introduce applied statistics before teaching you the theory, which would mean you've actually dealt with real data before you learn the theory. I think that might be a Very Good Idea.
Now let's hope you've taken one of the following paths:
From there I would recommend knowing linear algebra decently well before moving on. Then you can start taking courses/reading textbooks in more advanced/theoretical machine learning, computational Bayesian methods, multidimensional frequentist statistics, causal analysis, or just more and more applied data analysis. You should probably check what sort of statistical methods are favored "in the field" that you actually care about.
Why is that surprising? The causal structure of the world is very sparse, by the nature of causality. One cause has several effects, so once you scale up to lots of causative variables, you expect to find that large portions of the variance in your data are explained by only a few causal factors.
Causality is indeed the skeleton of data. And oh boy, wait until you hit hierarchical Bayes models!
Not quite. PCA helps you reduce dimensionality by discovering the directions of variation in your feature-space that explain most of the variation (in fact, a total ordering of the directions of variation in the data by how much variation they explain). Then there's Independent Components Analysis, which separates your feature data into its most independent/orthogonal directions of variation.
Can you expand your reasoning? We do see around us sparse — that is, understandable — causal systems. And even chaotic ones often give rise to simple properties (e.g. motion of huge numbers of molecules → gas laws). But why (ignoring anthropocentric arguments) would one expect to see this?
There are really just three ways the causal structure of reality could go:
Since the latter will generate more (apparent) random variables, most observables will end up deriving from a relatively sparse causal structure, even if we assume that the causal structures themselves are sampled uniformly from this selection of three.
So, for instance, parameter-space compression (which is its own topic to explain, but oh well), aka: the hierarchical structure of reality, actually does follow that first item: many micro-level causes give rise to a single macro-level observable. But you'll still find that most observables come from non-compressive causal structures.
This is why we actually have to work really hard to find out about micro-scale phenomena (things lower on the hierarchy than us): they have fewer observables whose variance is uniquely explicable by reference to a micro-scale causal structure.
I need that expanded a lot more. Why not many causes -> many effects, for example?
Ah, you mean a densely interconnected "almost all to almost all" causal structure. Well, I'd have to guess: because that would look far more like random behavior than causal order, so we wouldn't even notice it as something to causally analyze!
We do notice turbulence as something doesn't look random, and is hard-to-impossible to causally analyze.
Here's an anecdote. I can't copy and paste it, but it's in the middle column.
This is a very interesting point. PCA (or as its time and/or space series version is called, the Karhunen-Loève expansion and/or POD) has not been found to be useful for turbulence modeling, as I recall. There's a brief section in Pope's book on turbulence about modeling with this. From what I understand, POD is mostly used for visualization purposes, not to help build models. (It's worth noting that while my background in fluid dynamics is strong, I know little to nothing about PCA and the like aside from what they apparently do.)
Maybe I don't actually understand causality, but I think in terms of modeling, we do have a good model (the Navier-Stokes, or N-S, equations) and so in some sense, it's clear what causes what. In principle, if you run a computer simulation with these equations and the correct boundary conditions, the result will be reasonably accurate. This has been demonstrated through direct simulations of some relatively simple cases like flow through a channel. So that's not the issue. The actually issue is that you need a lot of computing power to simulate even basic flows, and attempts to develop lower order models have been fairly unsuccessful. So as a model, N-S is of limited utility as-is.
In my view, the "turbulence problem" comes down to two facts: 1. the N-S equations are chaotic (sensitive to initial conditions, so small changes can cause big effect) and 2. they exhibit large scale separation (so the smallest details you need to resolve, the Kolmolgorov scales in most cases are much smaller than the physical dimensions of a problem, say the length of a wing). To understand these points better, imagine that rigid body dynamics was inaccurate (say, modeling the trajectory of a baseball), and you had to model all the individual atoms to get it right. And if one was off that might possibly have a big effect. Obviously that's a lot harder, and it's probably computationally intractable outside of a few simple cases. (The chaos part is "avoided" because you probably would simulate an ensemble of initial conditions via Monte Carlo or something else, and get an "ensemble mean" which you would compare against an experiment. This works well from what I understand even if the details are unclear.)
So in some sense, yes, this looks like an "almost all to almost all" causal structure. Though, I looked up a bit about causal diagrams and it's not even clear to me how you might draw one for turbulence, and not because of turbulence itself. It's not clear what an "event" might be to me. There isn't even a precise definition of "turbulence" to begin with, so maybe this should be expected. I suppose on some level such things are arbitrary and you could define an event to be fluid movement in some direction, for each direction, each point in space, and each time. I'm not sure if anyone has done this sort of analysis.
(For the incompressible N-S equations, you can easily say that everything causes everything because the equations are elliptic, so the speed of sound is infinite (which means changes in some place are felt everywhere instantaneously). In other words, the "domain of dependence" is everywhere. But I don't know if that means these effects are substantial. Obviously in reality, far away from something quiet that's happening, you don't notice it, even if the sound waves had time to reach you. In practice, this means that doing incompressible fluid dynamics requires the solution of an elliptic PDE, which can be a pain for reasons unrelated to turbulence.)