In academic biomedicine, at least, which is where I work, it’s all about tech dev. Most of the development is based on obvious signals and conceptual clarity. Yes, we do study biological systems, but that comes after years, even decades, of building the right tools to get a crushingly obvious signal out of the system of interest. Until that point all the data is kind of a hint of what we will one day have clarity on rather than a truly useful stepping stone towards it. Have as much statistical rigor as you like, but if your methods aren’t good enough to deliver the data you need, it just doesn’t matter. Which is why people read titles, not figure footnotes: it’s the big ideas that really matter, and the labor going on in the labs themselves. Papers are in a way just evidence of work being done.
That’s why I sometimes worry about LessWrong. Participants who aren’t professionally doing research and spend a lot of time critiquing papers over niche methodological issues be misallocating their attention, or searching under the spotlight. The interesting thing is growth in our ability to measure and manipulate phenomena, not the exact analysis method in one paper or another. What’s true w...
It's not evidence, it's just an opinion!
But I don't agree with your presumption. Let me put it another way. Science matters most when it delivers information that is accurate and precise enough to be decision-relevant. Typically, we're in one of a few states:
I think what John calls "memetic" research is just areas where the topics or themes a...
I'm not sure median researcher is particularly important here, relatively to, say, median lab leader.
Median voter theorem works explicitly because votes of everyone are equal, but if you have lab/research group leader who disincentivizes bad research practices, then you theoretically should get lab with good research practices.
In practice, lab leaders are often people who Goodhart incentives, which results in current situation.
LessWrong has chance to be better exactly because it is outside of current system of perverse incentives. Although, it has its own bad incentives.
I had the thought while reading the original post that I recall speaking to at least one researcher who, pre-replication crisis, was like "my work is built on a pretty shaky foundation as is most of the research in this field, but what can you do, this is the way the game is played". So that suggested to me that plenty of median researchers might have recognized the issue but not been incentivized to change it.
Lab leaders aren't necessarily in a much better position. If they feel responsibility toward their staff, they might feel even more pressured to keep gaming the metrics so that the lab can keep getting grants and its researchers good CVs.
I am not sure about the median researcher. Many fields have a few "big names" that everybody knows and who's opinions have disproportionate weight.
My default assumption for any story that ends with "And this is why our ingroup is smarter than everyone else and people outside won't recognize our genius" is that the story is self-serving nonsense, and this article isn't giving me any reason to think otherwise.
A "userbase with probably-high intelligence, community norms about statistics and avoiding predictable stupidity" describes a lot of academia. And academia has a higher barrier to entry than "taking the time to write some blog articles". The average lesswrong user doesn't need to run an RCT before airing their latest pet theory for the community, so why would it be so much better at selectively spreading true memes than academia is?
I would need a much more thorough gears-level model of memetic spread of ideas, one with actual falsifiable claims (you know, like when people do science) before I could accept the idea of LessWrong as some kind of genius incubator.
community norms which require basically everyone to be familiar with statistics and economics,
I think this is too strong. There are quite a few posts that don't require knowledge of either one to write, read, or comment on. I'm certain that one could easily accumulate lots of karma and become a well-respected poster without knowing either.
Yeah, I didn't read this post and come away with "and this is why LessWrong works great", I came away with a crisper model of "here are some reasons LW performs well sometimes", but more importantly "here is an important gear for what LW needs to work great."
Show me a field where replication crises tear through, exposing fraud and rot and an emperor that never had any clothes, a field where replications fail so badly that they result in firings and polemics in the New York Times and destroyed careers- and then I will show you a field that is a little confused but has the spirit and will get there sooner or later.
What you really need to look out for are fields that could never, on a conceptual level, have a devastating replication crisis. Lesswrong sometimes strays a little close to this camp.
Show me a field where replication crises tear through, exposing fraud and rot and an emperor that never had any clothes, a field where replications fail so badly that they result in firings and polemics in the New York Times and destroyed careers- and then I will show you a field that is a little confused but has the spirit and will get there sooner or later.
So... parapsychology? How'd that work out? Did they have the (ahem) spirit and get there sooner or later?
Personally I am quite pleased with the field of parapsychology. For example, they took a human intuition and experience ("Wow, last night when I went to sleep I floated out of my body. That was real!") and operationalized it into a testable hypothesis ("When a subject capable of out of body experiences floats out of their body, they will be able to read random numbers written on a card otherwise hidden to them.") They went and actually performed this experiment, with a decent deal of rigor, writing the results down accurately, and got an impossible result- one subject could read the card. (Tart, 1968.) A great deal of effort quickly went in to further exploration (including military attention with the men who stare at goats etc) and it turned out that the experiment didn't replicate, even though everyone involved seemed to genuinely expect it to. In the end, no, you can't use an out of body experience to remotely view, but I'm really glad someone did the obvious experiments instead of armchair philosophizing.
https://digital.library.unt.edu/ark:/67531/metadc799368/m2/1/high_res_d/vol17-no2-73.pdf is a great read from someone who obviously believes in the metaphysical, and then does a great job designing and running experiments and accurately reporting their observations, and so it's really only a small ding against them that the author draws the wrong larger conclusions in the end.
a field where replications fail so badly that they result in firings and polemics in the New York Times and destroyed careers-
A field can be absolutely packed with dreadful research and still see virtually no one getting fired. Take, for instance, the moment a prominent psychologist dubbed peers who questioned methodological standards as “methodological terrorists.” It’s the kind of rhetoric that sends a clear message: questioning sloppy methods isn’t just unwelcome; it’s practically heretical.
Claim: memeticity in a scientific field is mostly determined, not by the most competent researchers in the field, but instead by roughly-median researchers. [...] Sure, the most competent people in the field may recognize the problems, but the median researchers don’t, and in aggregate it’s mostly the median researchers who spread the memes.
This assumes the median researchers can't recognize who the competent researchers are, or otherwise don't look to them as thought leaders.
I'm not arguing that this isn't often the case, just that it isn't alw...
Eh. feels wrong to me. Specifically, this argument feels over-complicated.
As best I can tell, the predominant mode of science in replication-crisis affected fields is that they do causal inference by sampling from noisy posteriors.
The predominant mode of science in non-replication-crisis affected fields is that they don't do this or do this less.
Most of the time it seems like science is conducted like that in those fields because they have to. Can you come up with a better way of doing Psychology research? Science in hard fields is hard is definitely a less sexy hypothesis, but it seems obviously true?
They’re measuring a noisy phenomenon, yes, but that’s only half the problem. The other half of the problem is that society demands answers. New psychology results are a matter of considerable public interest and you can become rich and famous from them. In the gap between the difficulty of supply and the massive demand grows a culture of fakery. The same is true of nutrition— everyone wants to know what the healthy thing to eat is, and the fact that our current methods are incapable of discerning this is no obstacle to people who claim to know.
For a counterexample, look at the field of planetary science. Scanty evidence dribbles in from occasional spacecraft missions and telescopic observations, but the field is intellectually sound because public attention doesn’t rest on the outcome.
Can you come up with a better way of doing Psychology research?
Yes. More emphasis on concrete useful results, less emphasis on trying to find simple correlations in complex situations.
For example, "Do power poses work?". They did studies like this one where they tell people to hold a pose for five minutes while preparing for a fake job interview, and then found that the pretend employers pretended to hire them more often in the "power pose" condition. Even assuming there's a real effect where those students from that university actually impress those judges more when they pose powerfully ahead of time... does that really imply that power posing will help other people get real jobs and keep them past the first day?
That's like studying "Are car brakes really necessary?" by setting up a short track and seeing if the people who run the "red light" progress towards their destination quicker. Contrast that with studying the cars and driving behaviors that win races, coming up with your theories, and testing them by trying to actually win races. You'll find out very quickly if your "brakes aren't needed" hypothesis is a scientific breakthrough or foolishly naive.
Instead of stu...
Curated. I think this model is pretty useful and well-compressed, and I'm glad to be able to concisely link to it.
Insofar as it is accurate, the policy implications are still much open to debate, for here on LessWrong and for other ecosystems in the world.
People did in fact try to sound the alarm about poor statistical practices well before the replication crisis, and yet practices did not change,
This rings painfully true. As early as the late 1950s, at least one person was already raising a red flag about the risks that psychology[1] might veer into publishing a sea of false claims:
...There is some evidence that in fields where statistical tests of significance are commonly used, research which yields nonsignificant results is not published. Such research being unknown to other investigators m
Could you please elaborate on what you mean by "highly memetic" and "internal memetic selection pressures"? I'm probably not the right audience for this piece, but that particular word (memetic) is making it difficult for me to get to grips with the post as a whole. I'm confused if you mean there is a high degree of uncritical mimicry, or if you're making some analogy to 'genetic' (and what that analogy is...)
I think you're using "memetic" to mean "of high memetic fitness", and I wish you wouldn't. No one uses "genetic" in that way.
An idea that gets itself copied a lot (either because of "actually good" qualities like internal consistency, doing well at explaining observations, etc., or because of "bad" (or at least irrelevant) ones like memorability, grabbing the emotions, etc.) has high memetic fitness. Similarly, a genetically transmissible trait that tends to lead to its bearers having more surviving offspring with the same trait has high genetic fitness. On the other hand, calling a trait genetic means that it propagates through the genes rather than being taught, formed by the environment, etc., and one could similarly call an idea or practice memetic if it comes about by people learning it from one another rather than (e.g.) being instinctive or a thing that everyone in a particular environment invents out of necessity.
When you say, e.g., "lots of work in that field will be highly memetic despite trash statistics, blatant p-hacking, etc." I am pretty certain you mean "of high memetic fitness" rather than "people aware of it are aware of it because they learned of it from others r...
I think in principle it makes sense in the same sense “highly genetic” makes sense. If a trait is highly genetic, then there’s a strong chance for it to be passed on given a reproductive event. If a meme is highly memetic, then there’s a strong chance for it to be passed on via a information transmission.
In genetic evolution it makes sense to distinguish this from fitness, because in genetic evolution the dominant feedback signal is whether you found a mate, not the probability a given trait is passed to the next generation.
In memetic evolution, the dominant feedback signal is the probability a meme gets passed on given a conversation, because there is a strong correlation between the probability someone passes on the information you told them, and getting more people to listen to you. So a highly memetic meme is also incredibly likely to be highly memetically fit.
Again using the replication crisis as an example, you may have noticed the very wide (like, 1 sd or more) average IQ gap between students in most fields which turned out to have terrible replication rates and most fields which turned out to have fine replication rates.
This is rather weak evidence for your claim ("memeticity in a scientific field is mostly determined, not by the most competent researchers in the field, but instead by roughly-median researchers"), unless you additionally posit another mechanism like "fields with terrible replication rat...
community norms which require basically everyone to be familiar with statistics and economics
I disagree. At best, community norms require everyone to in principle be able to follow along with some statistical/economic argument.
That is a better fit with my experience of LW discussions. And I am not, in fact, familiar with statistics or economics to the extent I am with e.g. classical mechanics or pre-DL machine learning. (This is funny for many reasons, especially because statistical mechanics is one of my favourite subjects in physics.) But it remain...
I think the central claim is true in some fields and not particularly true in others. But one variation of the claim that interests me is, if we consider memeticity relative to public opinions/awareness, then I agree with this more strongly, where the opinions of median researchers are disproportionately more popular than more correct opinions. In some sense, I think the "most correct opinions" are basically never the most popular to the public, though perhaps I wouldn't invoke the median researcher problem to explain this edge case, but a more specific hy...
I happened to read a Quanta article about equivalence earlier, and one of the threads is the difficulty of a field applying a big new concept without the expository and distillation work of putting stuff into textbooks/lectures/etc.
That problem pattern-matches with the replication example, but well-motivated at the front end instead of badly-motivated at the back end. It still feels like exposition and distillation are key tasks that govern the memes-in-the-field passed among median researchers.
I strongly suspect the crux of the replication crisis example ...
On the other hand, that does not mean that those fields are going to recognize [a smart outsider] as a thought-leader or whatever.
Could this be a more general problem that people in academia are very unlikely to recognize anything that originates outside of academia?
I mean, first you get your degree by listening to people in academia, and reading textbooks written by people in academia. Then you learn to solve problems, defined by people in academia, under the leadership of people in academia, mostly by reading a lot of papers written and reviewed by peopl...
In entrepreneurship, there is the phrase "ideas are worthless". This is because everyone already has lots of ideas they believe are promising. Hence, a pre-business idea is unlikely to be stolen.
Similarly, every LLM researcher already has a backlog of intriguing hypotheses paired with evidence. So an outside idea would have to seem more promising than the backlog. Likely this will require the proposer to prove something beyond evidence.
For example, Krizhevsky/Sutskever/Hinton had the idea of applying then-antiquated neural nets to recognize images. Only when they validated this in the ImageNet competition did their idea attract more researchers.
This is why ideas/hypotheses -- even with a bit of evidence -- are not considered very useful. What would be useful is to conclusively prove an idea true. This would attract lots of researchers ... but it turns out to be incredibly difficult to do, and in some cases requires sophisticated techniques. (The same applies to entrepreneurship. Few people will join/invest until you validate the idea and lower the risk.)
Sidenote: There are countless stories of academics putting forth original, non-mainstream ideas, only to be initially rejected by their peers (e.g. Cantor's infinity). I believe this not to be an issue just with outsiders, but merely that extraordinary claims require extraordinary proof. ML is an interesting example, because lots of so-called outsiders without a PhD now present at conferences!
I think one thing that's missing here is that you're making a first-order linear approximation of "research" as just continually improving in some direction. I would instead propose a quadratic model where there is some maximal mode of activity in the world, but this mode can face certain obstacles that people can remove. Research progress is what happens when there's an interface for removing obstacles that people are gradually developing familiarity with (for instance because it's a newly developed interface).
Different people have different speeds by whi...
Prototypical example: imagine a scientific field in which the large majority of practitioners have a very poor understanding of statistics, p-hacking, etc. Then lots of work in that field will be highly memetic despite trash statistics, blatant p-hacking, etc. Sure, the most competent people in the field may recognize the problems, but the median researchers don’t, and in aggregate it’s mostly the median researchers who spread the memes.
Complicated analysis (like going far beyond p-values) is easy for anyone to see and it is evidence of effort. Comple...
Is your concern simply in the 'median' researchers unfamiliarity with basic statistics? or the other variables that typically accompany a researcher without basic statistics knowledge?
On a different note,
Due to your influence/status in the field, I think it would be awesome for you to set a clear-cut resource detailing what you would like to see the 'median' researchers do that they are not (other than the obvious frustration regarding statistics incompetence stated above).
A small research community of unusually smart/competent/well-informed people can relatively-easily outperform a whole field, by having better internal memetic selection pressures.
It's not obvious to me that this is true, except insofar as a small research community can be so unusually smart/competent/etc that their median researcher is better than a whole field's median researcher so they get better selection pressure "for free". But if an idea's popularity in a wide field is determined mainly by its appeal to the median researcher, I would naturally...
How do you think competent people can solve this problem within their own fields of expertise?
For example, the EA community is a small & effective community like you've referenced for commonplace charity/altruism practices.
How could we solve the median researcher problem & improve the efficacy & reputation of altruism as a whole?
Personally, I suggest taking a marketing approach. If we endeavor to understand important similarities between "median researchers", so that we can talk to them in the language they want to hear, we may be a...
Well, I hope that the self-importance shown in this post is not a true reflection of the community; although unfortunately I think it might well be.
Spurious correlation here, big time, imho.
Give me the natural content of the field and I bet I easily predict whether it may or may not have replication crisis, w/o knowing the exact type of students it attracts.
I think it's mostly that the fields where bad science may be sexy and less-trivial/unambiguous to check, or, those where you can make up/sell sexy results independently of their grounding, may, for whichever reason, also be those that attract the non-logical students.
Agree though with the mob overwhelming the smart outliers, but I just think how much that mob creates a replication crises is at least in large part dependent on the intrinsic nature of the field rather than due to the exact IQs.
We argue that memeticity—the survival and spread of ideas—is far more complex than the influence of average researchers or the appeal of articulate theories. Instead, the persistence of ideas in any field depends on a nuanced interplay of feedback mechanisms, boundary constraints, and the conditions within the field itself.
In fields like engineering, where feedback is often immediate and tied directly to empirical results, ideas face constant scrutiny through testing and validation. Failed practices here are swiftly corrected, creating a natural selection ...
Claim: memeticity in a scientific field is mostly determined, not by the most competent researchers in the field, but instead by roughly-median researchers. We’ll call this the “median researcher problem”.
Prototypical example: imagine a scientific field in which the large majority of practitioners have a very poor understanding of statistics, p-hacking, etc. Then lots of work in that field will be highly memetic despite trash statistics, blatant p-hacking, etc. Sure, the most competent people in the field may recognize the problems, but the median researchers don’t, and in aggregate it’s mostly the median researchers who spread the memes.
(Defending that claim isn’t really the main focus of this post, but a couple pieces of legible evidence which are weakly in favor:
… mostly, though, the reason I believe the claim is from seeing how people in fact interact with research and decide to spread it.)
Two interesting implications of the median researcher problem:
In particular, LessWrong sure seems like such a community. We have a user base with probably-unusually-high intelligence, community norms which require basically everyone to be familiar with statistics and economics, we have fuzzier community norms explicitly intended to avoid various forms of predictable stupidity, and we definitely have our own internal meme population. It’s exactly the sort of community which can potentially outperform whole large fields, because of the median researcher problem. On the other hand, that does not mean that those fields are going to recognize LessWrong as a thought-leader or whatever.