Way back in 2020 there was an article A Proposed Origin For SARS-COV-2 and the COVID-19 Pandemic, which I read after George Church tweeted it (!) (without comment or explanation). Their proposal (they call it "Mojiang Miner Passage" theory) in brief was that it WAS a lab leak but NOT gain-of-function. Rather, in April 2012, six workers in a "Mojiang mine fell ill from a mystery illness while removing bat faeces. Three of the six subsequently died." Their symptoms were a perfect match to COVID, and two were very sick for more than four months.
The proposal is that the virus spent those four months adapting to life in human lungs, including (presumably) evolving the furin cleavage site. And then (this is also well-documented) samples from these miners were sent to WIV. The proposed theory is that those samples sat in a freezer at WIV for a few years while WIV was constructing some new lab facilities, and then in 2019 researchers pulled out those samples for study and infected themselves.
I like that theory! I’ve liked it ever since 2020! It seems to explain many of the contradictions brought up by both sides of this debate—it’s compatible with Saar’s claim that the furin cleavage site is very different from what’s in nature and seems specifically adapted to humans, but it’s also compatible with Peter’s claim that the furin cleavage site looks weird and evolved. It’s compatible with Saar’s claim that WIV is suspiciously close to the source of the outbreak, but it’s also compatible with Peter’s claim that WIV might not have been set up to do serious GoF experiments. It’s compatible with the data comparing COVID to other previously-known viruses (supposedly). Etc.
Old as this theory is, the authors are still pushing it and they claim that it’s consistent with all the evidence that’s come out since then (see author’s blog). But I’m sure not remotely an expert, and would be interested if anyone has opinions about this. I’m still confused why it’s never been much discussed.
I agree, I think the most likely version of the lab leak scenario does not involve an engineered virus. Personally I would say 60% chance zoonotic, 40% chance lab leak.
Given that they had engineered viruses in the lab at biosafety level II, why do you think the most likely version of the lab leak scenario does not involve an engineered virus?
I’m interested in Metacelsus’s answer.
My take is: I really haven’t been following the lab leak stuff. The point of my comment was to bring this hypothesis to the attention of people who have, and hopefully get some takes from them. As I understand it:
I think that’s more than enough to at least raise the Mojiang Miner Passage theory to consideration. Figuring out whether the theory is actually true or not would require a lot more beyond that, e.g. arguments about the exact genetic code of the furin cleavage site and all this other stuff which is way outside my area of expertise. :)
The frustrating thing about the discussion about the origins is that people seldom show recognition of the priorities here, and all get lost in the weeds.
You can get n layers deep into the details, and if the bottom is at n+1 you're fucked. To give an example I see people talking about with this debate, "The lab was working on doing gain of function to coronaviruses just like this!" sounds pretty damning but "actually the grant was denied, do you think they'd be working on it in secret after they were denied funding?" completely reverses it. Then after the debate, "Actually, labs frequently write grant proposals for work they've already done, and frequently are years behind in publishing" reverses it again. Even if there's an odd number of remaining counters, the debate doesn't demonstrate it. If you're not really really careful about this stuff, it's very easy to get lost and not realize where you've overextended on shaky ground.
Scott talks about how Saar is much more careful about these "out of model" possibilities and feels ripped off because his opponent wasn't, but at least judging from Scott's summary it doesn't appear he really hammered on what the issue is here and how to address it.
Elsewhere in the comments here Saar is criticized for failing to fact check the dead cat thing, and I think that's a good example of the issue here. It's not that any individual thing is too difficult to fact check, it's that when all the evidence is pointing in one direction (so far as you can tell) then you don't really have a reason to fact check every little thing that makes total sense so of course you're likely to not do it. If someone argues that clay bricks weigh less than an ounce, you're going to weigh the first brick you see to prove them wrong, and you're not going to break it open to confirm that it's not secretly filled with something other than clay. And if it turns out it is, that doesn't actually matter because your belief didn't hinge on this particular brick being clay in the first place.
If it turns out that a lot of your predictions turn out to be based on false presuppositions, this might be an issue. If it turns out the trend you based your perspective on just isn't there, then yeah that's a problem. But if that's not actually the evidence that formed your beliefs, and they're just tentative predictions that aren't required by your belief under question, then it means much less. Doubly so if we're at "there exists a seemingly compelling counterargument" and not "we've gotten to the bottom of this, and there are no more seemingly compelling counter-counterarguments".
So Saar didn't check if the grant was actually approved. And Peter didn't check if labs sometimes do the work before writing grant proposals. Or they did, and it didn't come through in the debate. And Saar missed the cat thing. Peter did better on this game of "whack-a-mole" of arguments than Saar did, and more than I expected, but what is it worth? Truth certainly makes this easier, but so does preparation and debate skill, so I'm not really sure how much to update here.
What I want to see more than "who can paint an excessively detailed story that doesn't really matter and have it stand up to surface level scrutiny better", is people focusing on the actual cruxes underlying their views. Forget the myriad of implications n steps down the road which we don't have the ability to fully map out and verify, what are the first few things we can actually know, and what can we learn from this by itself? If we're talking about a controversial "relationship guru", postpone discussions of whether clips were "taken out of context" and what context might be necessary until we settle whether this person is on their first marriage or fifth. If we're wondering if a suspect is guilty of murder, don't even bother looking into the credibility of the witness until you've settled the question of does the DNA match.
If there appears to be a novel coronavirus outbreak right outside a lab studying novel coronaviruses, is that actually the case? Do we even need to look at anything else, and can looking at anything else even change the answer?
To exaggerate the point to highlight the issue, if there were unambiguously a million wet markets that are all equivalent, and one lab, and the outbreak were to happen right between the lab and the nearest wet market, you're done. It doesn't matter how much you think the virus "doesn't look engineered" because you can't get to a million to one that way. Even if you somehow manage to make what you think is a 1000:1 case, a) even if your analysis is sound it still came from the lab, b) either your analysis there or the million to one starting premise is flawed. And if we're looking for a flaw in our analyses, it's going to be a lot easier to find flaws in something relatively concrete like "there are a million wet markets just like this one" than whatever is going into arguing that it "looks natural".
So I really wish they'd sit down and hammer out the most significant and easiest to verify bits first. How many equally risky wet markets are there? How many labs? What is the quantitative strength of the 30,000 foot view "It looks like an outbreak of chocolatey goodness in Hershey Pennsylvania"? What does it actually take to have arguments that contain leaks to this degree, and can we realistically demonstrate that here?
I think Michael Weissman's v5.7 research/analysis might be exactly what you are looking for. I've been searching for a long time for analysis that makes a compelling case in either direction, especially for the absolutely most important core components of the debate. In a sea of high-effort research and analysis, Michael's post is the first one that has convinced me. He dives into very similar points to what you're searching for.
Even if you don't read it in full (it's long), I still see value in searching for specific elements to see his analysis on those points, such as his discussion about the wet market. For example, if you search for "animals/year" and "HSM" (Huanan Seafood Market), you'll see he goes into the animal trade numbers specifically at the HSM when compared to numbers for other wet markets in China. There are many other topics he analyzes that you might find similarly interesting.
Like you, I am wary of getting distracted too much with lines of evidence that may ultimately carry little weight. I appreciate that Gwern likely was motivated by the cat evidence to demonstrate to everyone how Peter may misrepresent evidence/arguments; I also think this evidence is so insignificant to the overall debate that it's not important enough to get bogged down in.
This is an oversimplification, but for brevity, I think the case really rests on two components: the wet market as the origin, and the DEFUSE proposal. The wet market is so foundational to a Zoonosis argument that if it were disproved, it really seems like the closest thing we've got right now to a "does the DNA match?" question.
Here's a brief list of some recent information (some as recent as March 2024) that updated me towards lab leak and added crucial evidence for what we actually "know". This is for the sake of explaining my thoughts to others, but is in no way all-encompassing. Michael does a far superior job of explaining these in great depth.
The DEFUSE proposal is especially difficult because it's uncertain and very much in the realm of "how much can we really know", but it seems so incredibly relevant and high-weight to the debate that I really think it still should be considered at the core and should be hammered out as much as possible. When looking at how SARS-CoV-2 ended up, they are unbelievably spot-on with describing specifically what they were working on, how precisely they would do it, the restriction enzymes they would use, the Furin cleavage site, the locations they would do it, the unsafe biosecurity levels the research would be done at, their motivations for the research, and much more. My understanding is that there were only 3 institutions in the world that were doing this exact research, and two of them (WIV and UNC) were involved with this proposal. The proposal describes a research plan that uncannily resembles the precise sequence of events and conditions one would anticipate if a pandemic were to emerge from a laboratory incident at or near the WIV. It really is almost as close a match as you could possibly expect.
I hope this helps. I'm curious what you and others think.
My current initial impression is that this debate format was not fit for purpose: https://www.astralcodexten.com/p/practically-a-book-review-rootclaim/comment/52659890
A debate sequel, with someone other than Peter Miller (but retaining and reevaluating all the evidence he got from various sources) would be nice. I can easily imagine Miller doing better work on other research topics that don't involve any possibility of cover ups or adversarial epistemics related to falsifiability, which seem to be personal issues for him in the case of lab leak at least.
Maybe with 200k on the line to incentivize Saar to return, or to set up a team this time around? With the next round of challengers bearing in mind that Saar might be willing to stomach a net loss of many thousands of dollars in order to promote his show and methodology?
If $100k was not enough to incentivize Saar & his team to factcheck Peter's simplest claims like "Connor said his cat died of COVID-19", where it takes me literally 15 seconds* to find it in Google and verify that Connor said the exact opposite of that (where an elementary school child could have factchecked this as well as I did), I don't think $200k is going to help Saar either. And I don't know how one would expect the debate format to work for any genuinely hard question if it takes approaching a million dollars to get anyone to do sub-newspaper-level factchecking of Peter's claims. (If you can't even check quotes, like 'did this dude say in the Daily Mail what Peter said he said?' how on earth are you going to do well at all of these other things like mahjong parlors in wet markets that no longer exist or novel viral evolution or CCP censorship & propaganda operations or subtle software bugs in genomics software written by non-programmers...?) The problem is not the dollar amount.
* and I do mean "literally" literally. It should take anyone less than half a minute to check the cat claim, and if it takes more, you should analyze what's wrong with you or your setup. If you doubt me, look at my directions, which are the first query anyone should make - and if that's not an obvious query, read my search case-studies until it is - then get a stopwatch, open up google.com in a tab if you have neglected to set up a keyboard shortcut, and see how long it takes you to factcheck it as I describe.
Curated. (In particular recommending people click through and read the full Scott Alexander post)
I've been tracking the Rootclaim debate from the sidelines and finding it quite an interesting example of high-profile rationality.
I have a friend who's been following the debate quite closely and finding that each debater, while flawed, had interesting points that were worth careful thought. My impression is a few people I know shifted from basically assuming Covid was probably a lab-leak, to being much less certain.
In general, I quite like people explicitly making public bets, and following them up with in-depth debate.
[Mod note: I edited out some of the meta commentary from the beginning for this curation. In-general for link posts I have a relatively low bar for editing things unilaterally, though I of course would never want to misportray what an author said]
I've been tracking the Rootclaim debate from the sidelines and finding it quite an interesting example of high-profile rationality.
Would you prefer the term "high-performance rationality" over "high-profile rationality"?
One thing that occurs to me is that each analysis, such as the Putin one, can be thought of as a function hypothesis.
It takes as inputs the variables:
Russian demographics
healthy lifestyle
family history
facial swelling
hair present
And is outputting the probability 86%, where the function is
P = F(demographics, lifestyle, history, swelling, hair) and then each term is being looked up in some source, which has a data quality, and the actual equation seems to be a mix of Bayes and simple probability calculations.
There are other variables not considered, and other valid reasoning tracks. You could take into account the presence of oncologists in putin's personal staff. Intercepted communication possibly discussing it. Etc. I'm not here to discuss the true odds of putin developing cancer, but note that if the above is "function A", and another function that takes into account different information is "function B", you should be aggregating all valid functions, forming a "probability forest".
Perhaps you weight each one by the likelihood of the underlying evidence being true. For example each of the above facts is effectively 100% true except for the hair present (putin could have received a hair transplant) and family history (some relative causes of death could be unknown or suspicious that it was cancer)
This implies a function "A'n", where we assume and weight in the probability that each combination of the underlying variables has the opposite value. For example, if pHair_Present = 0.9, A' has one permutation where the hair is not present due to a transplant.
This hints at why a panel of superforecasters is presently the best we can do. Many of them do simple reasoning like this and we see it in the comment section on Manifold. But each individual human doesn't have the time to think of 100 valid hypotheses and to calculate the resulting probability, many manifold bettors seem to usually consider 1 and bet their mana.
An AI system (LLM based with plugin access) able to do the legwork here would be very useful...
Giving this kind of pearls in the description of the method : " “There is only one straight line that contains two different points”." (https://www.rootclaim.com/how-rootclaim-works), one can't help but wonder if the claimed method is as sound as it's supposed implications are far reaching...
A problem with the debate format is mistakes that may be picked up if submissions were filed in advance can get missed. For example, the claim serial passage would show N501Y mutations that are not seen in SARS-CoV-2 was incorrect. It would in BALB/c mice but not hACE2 mice which is what WIV had.
In terms of getting to the truth of the matter since the debate several new papers have undermined the core arguments relied on from Worobey et al and Pekar et al. for Huanan Seafood Market origin:
Spatial statistics experts Stoyan and Chiu (2024) find the statistical argument by Worobey et. al. that Huanan Seafood Market was the early epicenter is flawed. https://academic.oup.com/jrsssa/advance-article-abstract/doi/10.1093/jrsssa/qnad139/7557954
Lv et. al. (2024) found new intermediate genomes so the multiple spillover theory is unlikely (it was anyway given lineage A and B are only two mutations apart). Single point of emergence is more likely with lineage A coming first. The market cases were all lineage B so not the primary cases. Their findings are consistent with Caraballo-Ortiz (2022), Bloom (2021). t.co/50kFV9zSb6
Jesse Bloom (2023) published a new analysis showing that genetic material from some animal CoVs is fairly abundant in samples collected during the wildlife-stall sampling of the Huanan Market on Jan-12-2020. However, SARS-CoV-2 is not one of these CoVs. t.co/rorquFs1wm
Michael Weissman (2024) shows a model with ascertainment collider stratification bias fits early Covid case location data much better than the model that all cases ultimately stemmed from the market. George Gao, Chinese CDC head at the time, acknowledged this to the BBC last year - they focused too much on and around the market and may have missed cases on the other side of the city).
https://academic.oup.com/jrsssa/advance-article-abstract/doi/10.1093/jrsssa/qnae021/7632556
The anonymous expert who identified coding errors in Pekar et. al. leading to an erratum last year has found another significant error. Single spillover looks more likely. t.co/GAPihZu51P
Ultimately was performing in vivo experiments in transgenic (human ACE2 expressing) mice and civets in 2018 and 2019 in SARS-like CoVs. The results are unknown and they won't share their records.