Excuse me for asking an off topic question, but forecasting tournament? That sounds like the incredibly nerdy kind of thing I'd enjoy.
Curated.
In my mind, Elizabeth's work on Epistemic Spot Checks (ESC's) generally, and this post in particular, are exactly the kinds of effort to innovate in truth-seeking that are core to LessWrong.
It's been written that the proper use of the word rational is to not simply mean "true" or "optimal", but instead to describe algorithms that systematically increase map-territory correspondence. I love the work on ESC and this post asking what comes next because they concern precisely this level of question: what algorithms systematically get us to truth? What processes can we follow.
I'm excited for more work in this direction and the LW community developing expertise not just in talking about things, but really, actually getting at what's is the case in the territory.
Would love to hear people weigh in more with their thoughts.
On first blush this looks like a success story, but it’s not. I was only able to catch the mistake because I had a bunch of background knowledge about the state of the world. If I didn’t already know mid-millenium China was better than Europe at almost everything (and I remember a time when I didn’t), I could easily have drawn the wrong conclusion about that claim. And following a procedure that would catch issues like this every time would take much more time than ESCs currently get.
Re this particular point, I guess one thing you might be able to do is to check arguments, as opposed to statements of facts. Sometimes, one can evaluate whether arguments are valid even when one isn't too knowledgeable about the particular topic. I previously did some work on argument-checking of political debates. (Though the rationale for that wasn't that argument-checking can require less knowledge than fact-checking, but rather that fact-checking of political debates already exists, whereas argument-checking does not).
I never did any systematic epistemic spot checks, but if a book contains a lots of arguments that appear fallacious or sketchy, I usually stop reading it. I guess that's related.
I guess one thing you might be able to do is to check arguments, as opposed to statements of facts
First, let me say I think that would be interesting to experiment with. But the reasons to be dubious are more interesting, so I'm going to spend more time on those.
This can definitely rule people out. I don't think it can totally rule people in, because there's always a risk someone made a sound argument based on faulty assumptions. In fact this is a large, sticky genre that I'm very worried about
But assuming that was solved, there's something I find harder to express that might be at the core of why I'm doing this... I don't want to collect a bunch of other people's arguments I can apply as tools, and be confused if two of them conflict. I want a gears-level model of the world such that, if I was left with amnesia on an intellectual deserted island, I could re-derive my beliefs. Argument-checking as I conceive of it now more does the former. I can't explain why, exactly what I'm picturing when I say argument checking or what kind if amnesia I mean, but there's something there. My primary interest with argument-checking would be to find a way to engage with arguments in a way that develops that amnesia-proof knowledge.
I agree that the problem of sound arguments based on bad assumptions is a sticky one. I also agree with the gears-level world model objective.
My view of argument checking is that if we eschew it, how can we detect the amount of noise poor arguments are generating? It seems to me the clearest way of handling it is to treat the arguments as a separate information channel. Otherwise it will be difficult to identify the presence or absence of value with any confidence.
This is a good point. I think the epistemic ability to predict and evaluate arguments independently of the truth of the conclusion is something we want to heavily select for and reward, see e.g. Eliezer's writing on that here.
If Elizabeth is interested, I'm definitely interested in funding and experimenting with prediction markets on argument validity for the next round of amplifying epistemic spot checks.
I don't have much to add to this discussion, but I want to note that I'm extremely interested in any further insights you have about this, because this problem has always bothered me.
I expect you've already thought of this, but you might get some epistemic mileage out of looking at what primary documents a fact traces back to and reasoning about what those documents can/cannot possibly prove about the past. For example, if the claim about boat sizes had traced back to a set of documents about boats only in Europe, you would know to be suspicious.
This post... isn't exactly the most exciting one because it's mostly leaving a question rather than an answer, and the question is still fairly confused-at-the-time. But, I do think it's a pretty important question. I have hope that in another couple years this research agenda will have born fruit, and that this intermediate stage of it will have been a useful stepping stone.
When I read a non-fiction book, I want to know if it’s correct before I commit anything it says to memory. But if I already knew the truth status of all of its claims, I wouldn’t need to read it. Epistemic Spot Checks are an attempt to square that circle by sampling a book’s claims and determining their truth value, with the assumption that the sample is representative of the whole.
Some claims are easier to check than others. On one end are simple facts, e.g., “Emperor Valerian spent his final years as a Persian prisoner”. This was easy and quick to verify: googling “emperor valerian” was quite sufficient. “Roman ship sizes weren’t exceeded until the 15th century” looks similar, but it wasn’t. If you google the claim itself, it will look confirmed (evidence: me and 3 other people in the forecasting tournament did this). At the last second while researching this, I decided to check the size of Chinese ships, which surpassed Roman ships sooner than Europe did, by about a century.
On first blush this looks like a success story, but it’s not. I was only able to catch the mistake because I had a bunch of background knowledge about the state of the world. If I didn’t already know mid-millenium China was better than Europe at almost everything (and I remember a time when I didn’t), I could easily have drawn the wrong conclusion about that claim. And following a procedure that would catch issues like this every time would take much more time than ESCs currently get.
Then there’s terminally vague questions, like “Did early modern Europe have more emphasis on rationality and less superstition than other parts of the world?” (As claimed by The Unbound Prometheus). It would be optimistic to say that question requires several books to answer, but even if that were true, each of them would need at least an ESC themselves to see if they’re trustworthy, which might involve checking other claims requiring several books to verify… pretty soon it’s a master’s thesis.
But I can’t get a master’s degree in everything interesting or relevant to my life. And that brings up another point: credentialism. A lot of ESC revolves around “Have people who have officially been Deemed Credible sanctioned this fact?” rather than “Have I seen evidence that I, personally, judge to be compelling?”
The Fate of Rome (Kyle Harper) and The Fall of Rome (Bryan Ward-Perkins) are both about the collapse of the western Roman empire. They both did almost flawlessly on their respective epistemic spot checks. And yet, they attribute the fall of Rome to very different causes, and devote almost no time to the others’ explanation. If you drew a venn diagram of the data they discuss, the circles would be almost but not quite entirely distinct. The word “plague” appears 651 times in Fate and 6 times in Fall, who introduces the topic mostly to dismiss the idea that it was causally related to the fall- which is how Fate treats all those border adjustments happening from 300 AD on. Fate is very into discussing climate, but Fall uses that space to talk about pottery.
This is why I called the process epistemic spot checking, not truth-checking. Determining if a book is true requires not only determining if each individual claim is true, but what other explanations exist and what has been left out. Depending on the specifics, ESC as I do them now are perhaps half the work of reading the subsection of the book I verify. Really thoroughly checking a single paragraph in a published paper took me 10-20 hours. And even if I succeed at the ESC, all I have is a thumbs up/thumbs down on the book.
Around the same time I was doing ESCs on The Fall of Rome and The Fate of Rome (the ESCs were published far apart to get maximum publicity for the Amplification experiment, but I read and performed the ESCs very close together), I was commissioned to do a shallow review on the question of “How to get consistency in predictions or evaluations of questions?” I got excellent feedback from the person who commissioned it, but I felt like it said a lot more about the state of a field of literature than the state of the world, because I had to take authors’ words for their findings. It had all the problems ESCs were designed to prevent.
I’m in the early stages of trying to form something better: something that incorporates the depth of epistemic spot checks with the breadth of field reviews. It’s designed to bootstrap from knowing nothing in a field to being grounded and informed and having a firm base on which to evaluate new information. This is hard and I expect it to take a while.