Your experimental results might be indicative of something other than problems merely within LW...
I decided to test the hypothesis that LessWrongers practice weak scholarship in regards to jargon. In particular, that for many important terms the true source of knowledge has not been transmitted to community members. [bold added]
The problem here is that a better reference group than "LessWrongers" might be "scientists"?
Or perhaps the the group of "scholars" (understood as all the scientists, plus all the people "not doing real science" per whatever weird definition someone has for calling something "science"), or perhaps even the still larger category of "humans"?
There is a generalized problem with scholarship related cognition in the the widespread failure of humans to remember the source of the contents of their minds. Photographs of events you weren't even alive for become vague visual memories. Hearsay becomes eyewitness report. Fishy stories from people you know you shouldn't trust become stories you don't remember the source of... and then become things you weakly believe... basically: in general, by default, human minds are terrible at retaining auditable fact profiles.
But suppose that we don't expect that much of generic humans, and only hold scientists to high intellectual standards?
Still a no go!
As per Stigler's Law Of Eponymy there are almost no laws which were actually named after their (carefully searched for) originators! The general pattern is similar to art: "Good scientists borrow, great scientists steal."
In practice, the thing that will be remembered by large groups of people is good popularization, especially when a well received version keeps things simple and vivid and doesn't even bother to mention the original source.
If LW can fix this, it will be doing something over and above what science itself has accomplished in terms of scholarly integrity. (Whether this will actually help with technological advances is perhaps a separate question?)
----
For an example here, I know about "ugh fields" because I invented that term and know the details of its early linguistic history.
1. The coining in this case preceded the existence of the overcomingbias blog by a few years... it was coined in conversations in the 2001-2003 era in and around College of Creative Study (CCS) seminars at UC Santa Barbara (UCSB) between me and friends, some of whom later propagated the term into this community.
My use of the term was aimed at describing the subjective experience of catastrophic procrastination along with some causal speculation. It seemed that mild anxiety over a looming deadline could cause mild diversion into a nominally anxiety ameliorating behavior like video games... which made the deadline situation worse... and thereby turned into a positive feedback of "ugh". These ugh fields would feel they have an external source whose apparent locus is "the deadline", with the amount of ugh increasing exponentially as the deadline gets closer and closer.
(I failed a class or two back then more or less because of this dynamic until I restructured my soul into a somewhat more platonically moderate pattern using Allan Bloom's translation of The Republic as my inspiration. Basically: consciously locally optimized hedonism has potentially unrecoverable failure modes and should be used with caution, if at all. Make lists! Perhaps amortize hedonism over times equal to or greater than your personal budgeting cycle? Or maybe better yet try to slowly junk hedonism in favor of duty and virtue? Anyway. This is a WIP for me still...)
2. Two of my friends from UCSB (Anna and Steve) were part of the conversations about me failing classes at UCSB and working out a causal model thereof, and in roughly 2008 brought the term to "Benton House" (which was the first "rationalist house" wherein lived participants in "the visiting fellows program" of the old version of MIRI which was then called "the Singularity Institute for Artificial Intelligence (SIAI)").
3. The term then propagated through the chalk board culture of SIAI (and possibly into diaspora rationalist houses?) and eventually the concept turned into a LW post. The new site link for this post doesn't work at the moment that I write this, but archive.org still remembers the 2010 article when I said of "ugh fields":
It is a head trip to see a pet term for a quirk of behavior reflected back at me on the internet as an official name for a phenomenon.
4. And the term keeps rolling around. It basically has a life of its own now, accreting hypothetical mechanisms and stories and interpretations as it goes.
It would not surprise me if some academic (2 or 10 or 50 years from now) turns it into a law and the law gets named after them, in fulfillment of Stigler's Law :-P
----
The core thing I'm trying to communicate is that humans in general can only think sporadically, and with great effort, and misremember almost everything, and especially misremember sources/credit/trust issues. The world has too many details, and neurons are too expensive. External media is required.
Lesswrongers falling prey to attribution failures is to be expected by default, because Lesswrong is full of humans. The surprising thing would be generally high performance in this domain.
My working understanding is that many of the original english language enlightenment folks were mindful of the problem and worked to deal with it by mostly distrusting words and instead constantly returning to detailed empirical observations (or written accounts thereof), over and over, at every event where it was hoped that true knowledge of the world might be "verbally" transmitted.
+A bunch for running an experiment to test a hypothesis and reporting back!
That having been said, it's not at all clear to me that this is a problem worth solving. For example, mathematical history is littered with examples where important concepts are named, not after the originator, but after the person who first really made good use of the concept and/or really popularized it (I particularly have in mind motte and bailey as I say this). Most mathematicians probably couldn't tell you the originator of most of the concepts they use. This does not seem to have done much damage at all. (Although there's a nearby thing that I think is quite bad, which is not having any story at all for how one might go about inventing / reinventing a concept. But there's a huge difference between the historical story behind how something was discovered and the best way to present it pedagogically / capture the principle that generates it; Eliezer makes this point in the quantum mechanics sequence, for example.)
More concretely, suppose I want to know more about aliefs, after hearing someone talk about them in whatever context. If I just google "alief," I'll get to the Wikipedia article on aliefs, which has citations to papers I can read. So it's completely irrelevant that I started out not knowing where the concept came from; the moment I google it I'll find out.
The problem with not knowing the true origins of your concepts is that you can't be sure whether the concept was distorted, intentionally or unintentionally by the person re-transmitting the concept to you.
For example, much of what is attributed to Clausewitz in the popular mind was actually written by Jomini, who interpreted and summarized Clausewitz's ideas. In fact, Clausewitz actually disagreed with much of Jomini's writing. One example is that when Clausewitz referred to political considerations in the lead up to war, he was primarily referring to internal political considerations -- things like how much political capital the leadership had and whether the nation would be able to sustain the will to fight.
Jomini, however, translated Clausewitz's dictums on politics as referring to diplomacy, that is, he took what Clausewitz was saying about internal considerations and rephrased it as being about relations between states. This was a huge distortion, one which lay undiscovered for nearly a hundred years, since Jomini was widely regarded as the authoritative translator and summarizer of Clausewitz.
History and philosophy are riddled with these kinds of distortions and misinterpretations. Just look at the varying interpretations of Nietzsche, for example, and contrast those with the source material. That's why it's important to know where an idea comes from. You want to be able to verify that the idea you're receiving is the idea you think you're receiving, rather than some related or distorted version of that idea.
Another important consideration is finding related ideas. If I can track an idea back to its original source, I can find other things that person has written, and see if they have other good ideas. I can't do that if the original source is obscured or not cited.
1. If you understand how to regenerate a concept in an inside view way, there's an important sense in which it really doesn't matter who originated it, because you can correct any distortions in the concept yourself. In the same way that if you hear someone state a theorem and reprove it yourself, you can discover that they slightly misstated it and find the correct statement yourself. So it seems to me that this distortionary effect is more important the more your reasoning is outside view-flavored.
2. This all seems fine if you have an active project of seeking out good ideas, but I don't expect everyone in the community to actively be wanting to do this as opposed to the many other things they could be wanting to do. Said another way, I don't think you're engaging with the opportunity cost of paying attention to this as opposed to something else.
- If you understand how to regenerate a concept in an inside view way, there’s an important sense in which it really doesn’t matter who originated it, because you can correct any distortions in the concept yourself. In the same way that if you hear someone state a theorem and reprove it yourself, you can discover that they slightly misstated it and find the correct statement yourself. So it seems to me that this distortionary effect is more important the more your reasoning is outside view-flavored.
This isn't actually how ideas work. For one thing, this presupposes that the version of an idea which has been passed down to you is 'correct'/its most useful version in the first place. For example, Alan Kay invented Object Oriented Programming several decades ago, and most modern computer languages implement 'Object Oriented Programming'. The version they implement is of course significantly degraded from the version that appears in say, Smalltalk. But it's an improvement over what existed before in C, so nobody really notices that OOP could theoretically be something better. This is a stable situation that doesn't look like it'll be changing anytime soon.
https://www.youtube.com/watch?v=QjJaFG63Hlo
https://www.youtube.com/watch?v=YyIQKBzIuBY
The object level argument against this of course is that Alan Kay is wrong about the utility of his original version of OOP. On such things I have no comment.
The object level argument against this of course is that Alan Kay is wrong about the utility of his original version of OOP. On such things I have no comment.
The counter-argument, of course, is that sure, maybe Alan Kay is wrong about how useful his original version of OOP is, but is everyone who ever proposed an older-but-more-advanced version of an idea always wrong about the usefulness of that idea? Do implementations never degrade from prior visions for anything but the most unimpeachable reasons of effectiveness (as opposed to, say, market pressures, or contingent historical matters, etc.)?
I think it would be absurd to take such a position. The counterexamples are legion. To maintain any such view is to ascribe, to the market (both the actual market where products are offered and sold, and the “marketplace of ideas”), a sort of definitional correctness which it manifestly does not have.
Which is all to say that I agree with this:
For one thing, this presupposes that the version of an idea which has been passed down to you is ‘correct’/its most useful version in the first place.
And I concur that this presupposition is often mistaken.
While concurring entirely, let me add another, related motivation for wanting to know the origin of your concepts (one which applies quite strongly in, for instance, both philosophy and psychology):
Many concepts, conceptual frameworks, positions, etc., are born out of disputes. Concepts do not arise in a vacuum, and writers do not write in a vacuum; they are often responding to things other people—their contemporaries—have said, or are saying.
But if you don’t know what someone was responding to, you have no hope of grasping their motivations for saying what they were saying; and consequently you’ll fail to understand what they meant.
Often this takes the following form: there is some dispute, and one side takes position A, and the other side, position B. Much later, you—reading the latter side’s works out of context—encounter position B. It seems to you to be rather absurd, and obviously wrong, so you dismiss it. But what you’re missing is that “B” should really be read as “not A”—which is to say, that the thrust of the argument is “A is wrong; the truth is really more like B”. If you knew the context, you’d agree that A is absurd; and that B is a correction in the right direction. An overcorrection? Perhaps; but that is a secondary point.
Thus you dismiss the author of B as deluded, when in fact he may have been the one sane man in a dispute filled with madmen!
(Finding examples of this dynamic is left as an exercise for the reader…)
Alief is sort of an easy one. I found for example instrumental versus terminal to be a hard one to track down the source for. I think it would be a mistake to underestimate the difficulty of tracking down a hard citation, especially if your prior is that there's nothing to track down past a certain point. For example let's say you believe that Scott Alexander is the originator of Motte/Bailey, thankfully if you go back to his 'original source' for information you'll see that he's clearly referencing someone elses idea and be led back. But if you believed that and he hadn't cited his source, well this is the top search results for Motte-Bailey:
I could easily see our intrepid researcher going "oh, ratwiki is just a tertiary source no need to look at that, Scott's post is at the top so it's probably the canonical reference...junk underneath yeah I think we're good". Confirmation bias causes us to assume our best known etymology is the etymology.
For example, mathematical history is littered with examples where important concepts are named, not after the originator, but after the person who first really made good use of the concept and/or really popularized it (I particularly have in mind motte and bailey as I say this). Most mathematicians probably couldn’t tell you the originator of most of the concepts they use. This does not seem to have done much damage at all.
This isn't an issue with using the concepts, this is an issue with being able to tap into the original research or sources of received knowledge. Previous performance predicts future performance, and places where one good idea originated probably have others you could use. It's possible for someone to popularize one of a dozen really great ideas, but never get around to the other eleven. If we want to make full use of intellectual work it's a good idea not to set ourselves up for that to silently happen all the time.
Confirmation bias causes us to assume our best known etymology is the etymology.
But what is the actual bad thing that happens if this happens?
Previous performance predicts future performance, and places where one good idea originated probably have others you could use.
This is not as true as it sounds if you're selecting on previous performance as opposed to observing it; said another way, you aren't accounting for regression to the mean.
In a model where people have some hidden "propensity to have good ideas" stat and your probability of generating a good idea at any time is some function of this stat + noise, for many plausible distributions of this stat and the noise, most good ideas will have been had by a person who has one or two good ideas, simply because there are many more people who are okay at having good ideas + get lucky than there are people who are extremely good at having good ideas.
In any case, searching for people with good ideas is only one of many things that people might want to do and I don't see the hurry in wanting everyone to have gone through the first step or two of this process if they have other things to do.
If you do future surveys of this sort, I'd like you to ask people for their probabilities rather than just their best guesses. If people are uncertain but decently calibrated, I'd argue there's not much of a problem; if people are confidently wrong, I'd argue there's a real problem.
Introduction
Jargon is underrated in its importance to the framework of science. Luis Reyes-Galindo points out that jargon quite often ends up literally determining the boundaries of a field[0]. Given the disparity between how much attention is paid to jargon as a subject and its scholarly import, I decided to test the hypothesis that LessWrongers practice weak scholarship in regards to jargon. In particular, that for many important terms the true source of knowledge has not been transmitted to community members. Rather than a pedantic issue, this would imply deep issues with the way that LWers handle knowledge. Without a connection back to original sources, literature review on the part of community members could be severely suppressed.
Hypothesis
I started with a weak hypothesis and a strong hypothesis.
Weak Hypothesis: There will be at least one term in this list which respondents misidentify in origin overwhelmingly. In specific, at least 80% of respondents on at least one term or phrase will choose the wrong response.
Strong Hypothesis: There will be at least one term in this list which respondents misidentify in origin overwhelmingly. In addition, at least one third (4) of the words or phrases will have 50% or more of respondents incorrectly identify its origin.
These hypothesis were chosen in advance largely based on my gut intuition about the severity of the problem, and what would consitute 'sufficient evidence' to me that a problem existed.
Methodology
The methodology for this survey was preregistered here.
A list of search terms I used on Google Scholar for lit review before writing this article can be found here.
A survey was administered to 53 LWers on various chatrooms as well as my personal friends list. This survey contained twelve terms or phrases I felt were especially ambiguous as to their origin. (i.e, they were lacking obvious 'tells' of LW or academic origin)
Terms Used
Alief
Inside View/Outside View
Epistemic Learned Helplessness
Chinese Robber Fallacy
Anti-Inductive
Motte and Bailey
Map and Territory
Observer Effect
Terminal vs Instrumental Values/Goals
Ugh-Field
Illusion of Transparency
Optimiser's Curse
At least one term was chosen to sound especially academic and one term chosen to sound especially 'LessWrong Diaspora' to provide a baseline. These terms are HtuSvryq and BofreireRssrpg respectively (rot13). All terms or phrases were taken from the Jargon Dictionary hosted by myself. When I chose these terms I myself did not know the origin of several, which was entirely okay because that could be ascertained at analysis time.
Chatrooms surveyed
The chatrooms I pulled participants from are:
For each term users were asked whether it originated from LessWrong, academia, or neither. (While it is theoretically possible for a term to originate from both at the same time, I'm not aware of this actually happening so I did not consider the possibility in my survey.) To determine the results I use a simple plurality of responses against a 'correct' answer assigned for each term. Whichever answer received the most responses is the one users are determined to have 'chose' in aggregate.
Results
In the following table, green means an answer is what I marked 'correct'. Red means that the amount wrong exceeded the 50% threshold in my weak hypothesis.
Results Table
(Citations on answers can be found at this page)
Analysis
Both my weak hypothesis and strong hypothesis were validated. In the case of map and territory, I will ignore the results because of ambiguity. It is quite possible to classify Alfred Korzybski in either the academic or non-academic camps. However, on the Chinese Robber's fallacy 77% of respondents misidentified the origin. While this is not quite 80%, it is close enough for me to consider the weak hypothesis essentially validated. My strong hypothesis was also validated, given that 4 terms (excluding Map/Territory) were significantly over the 50% threshold to count as decided wrong.
Conclusion
My original purpose for this research was to see if there would be any value in adding an etymology section to the jargon dictionary. I think that the outcome of this survey implies the answer is yes. One potential goal of the jargon dictionary is to act as a Rosetta stone between LessWrong Diaspora jargon and academic terminology. The purpose of this would be to make literature review easier for people trying to 'dig deep' on rationality concepts for their research.
Beyond that, these results imply a potentially significant alienation of LessWrongers from the originators of concepts. As Samo Burja points out, the underlying principles that generated an idea are of incredible importance[1]. In failing to transmit the sources of knowledge it's quite possible we're retarding progress by making it non-obvious where to go for more. Worse still the problem is not necessarily easy to fix. Motte & Bailey for example, which Scott properly cites in his post on the subject[2], got a 64% incorrect response rate. Here I feel it is only appropriate to draw attention to the problem, but welcome potential solutions in the comments.
References
[0]: Reyes-Galindo, Lewis. (2016). Automating the Horae: Boundary-work in the age of computers. arXiv:1603.03824 [physics.soc-ph]
[1]: Burja, Samo. (2018, March 8). On the Loss and Preservation of Knowledge. Retrieved from https://www.lesserwrong.com/posts/nnNdz7XQrd5bWTgoP/on-the-loss-and-preservation-of-knowledge
[2]: Alexander, Scott. (2014, November 3). All in all, another brick in the motte. Retrieved from http://slatestarcodex.com/2014/11/03/all-in-all-another-brick-in-the-motte/