Having just read through this, one key point that I haven't seen people mentioning is that the results are for LLM's that need to be jail-broken.
So these results are more relevant to the release of a model over an API rather than open-source, where you'd just fine-tune away the safeguards or download a model without safeguards in the first place.
The original title of this post is "RAND doesn't believe current LLMs are helpful for bioweapons development". I don't think it makes sense to ascribe beliefs this specific to an entity as messy and big as RAND. I changed title to something that tries to be informative without making as strong of a presumption (for link posts to posts by off-site authors I take more ownership over how a post is titled, I wouldn't change the title if the author of the report had created it)
Some interesting takeaways from the report:
Access to LLMs (in particular, LLM B) slightly reduced the performance of some teams, though not by a statistically significant level:
Red cells equipped with LLM A scored 0.12 points higher on the 9-point scale than those equipped with the internet alone, with a p-value of 0.87, again indicating that the difference was not statistically significant. Red cells equipped with LLM B scored 0.56 points lower on the 9-point scale than those equipped with the internet alone, with a p-value of 0.25, also indicating a lack of statistical significance.
Planning a successful bioterrorism attack is intrinsically challenging:
the intrinsic complexity associated with designing a successful biological attack may have ensured deficiencies in the plans. While the first two factors could lead to a null result regardless of the existence of an LLM threat capability, the third factor suggests that executing a biological attack is fundamentally challenging.
This latter observation aligns with empirical historical evidence. The Global Terrorism Database records only 36 terrorist attacks that employed a biological weapon—out of 209,706 total attacks (0.0001 percent)—during the past 50 years. These attacks killed 0.25 people, on average, and had a median death toll of zero. As other research has observed,
“the need [for malign actors] to operate below the law enforcement detection threshold and with relatively limited means severely hampers their ability to develop, construct and deliver a successful biological attack on a large scale.”
Indeed, the use of biological weapons by these actors for even small-scale attacks is exceedingly rare.
Anecdotally, the LLMs were not that useful due to a few common reasons: refusing to comply with requests, giving inaccurate information, and providing vague or unhelpful information.
We conducted discussions with the LLM A red cells on their experiences. In Vignette 1, the LLM A cell commented that the model “just saves time [but] it doesn’t seem to have anything that’s not in the literature” and that they could “go into a paper and get 90 percent of what [we] need.” In Vignette 2, the LLM A cell believed that they “had more success using the internet” but that when they could “jailbreak [the model, they] got some information,” They found that the model “wasn’t being specific about [operational] vulnerabilities—even though it’s all public online.” The cell was encouraged that the model helped them find a dangerous toxin, although this toxin is described by the Centers for Disease Control and Prevention (CDC) as a Category B bioterrorism agent and discussed widely across the internet, including on Wikipedia and various public health websites. In Vignette 3, the LLM A cell reported that the model “was hard to even use as a research assistant [and we] defaulted to using Google instead” and that it had “been very difficult to do anything with bio given the unhelpfulness . . . even on the operational side, it is hard to get much.” The Vignette 4 LLM A cell had similar experiences and commented that the model “doesn’t want to answer a lot of things [and] is really hard to jailbreak.” While they were “able to get a decent amount of information” from the LLM, they would still “use Google to confirm.”
… We conducted discussions with the LLM B red cells as well. … In Vignette 3, those in the LLM B cell also found that the model had “been very forthcoming” and that they could “easily get around its safeguards.” However, they noted that “as you increase time with [the model], you need to do more fact checking” and “need to validate that information.” Those in the Vignette 4 LLM B cell, however, found that the model “maybe slowed us down even and [did not help] us” and that “the answers are inconsistent at best, which is expected, but when you add verification, it may be a net neutral.”
I noted that the LLMs don't appear to have access to any search tools to improve their accuracy. But if they did, they would just be distilling the same information as what you would find from a search engine.
More speculatively, I wonder if those concerned about AI biorisk should be less worried about run-of-the-mill LLMs and more worried about search engines using LLMs to produce highly relevant and helpful results for bioterrorism questions. Google search results for "how to bypass drone restrictions in a major U.S. city?" are completely useless and irrelevant, despite sharing keywords with the query. I'd imagine that irrelevant search results may be a significant blocker for many steps of the process to plan a feasible bioterrorism attack. If search engines were good enough that they could produce the best results from written human knowledge for arbitrary questions, that might make bioterrorism more accessible compared to bigger LLMs.
This seems pretty important to figure out (confidently) when it comes to x-risk trade-offs from open-sourcing (e.g. I pretty much buy the claims here: Open source AI has been vital for alignment).
I'm interested in whether RAND will be given access to perform the same research on future frontier AI systems before their release. This is useful research, but it would be more useful if applied proactively rather than retroactively.
This is one area where I hope the USG will be able to exert coercive force to bring companies to heel. Early access evals, access to base models, and access to training data seem like no-brainers from a regulatory POV.
Ok, so LLMs don't give an advantage in bioterrorism planning to a team of RAND researchers. Does it mean they don't give an advantage to actual terrorists, who are notoriously incompetent? https://gwern.net/terrorism-is-not-effective#sn17
I think you're misrepresenting Gwern's argument. He's arguing that terrorists are not optimizing for killing the most people. He makes no claims about whether terrorists are scientifically incompetent.
I agree that it's his main point; however, he's also making an observation that most terrorists are incompetent, impulsive, have poor preparation and planning, and choose difficult forms of attacks when better options are available. The post has several anecdotes illustrating that.
He believes the incompetence is caused by terrorist acting on social incentives instead of optimizing for their stated goals. However, what if some terrorist group has one earnest terrorist, or what if the chatbot provides the social encouragement need to spur a terrorist to action while simultaneously suggesting a more effective stategy? There also lone wolf terrorist who, while more practical, are limited to their own ideas, so probably less competent than a whole team of researchers.
Thanks for the link post!
Edit: looks like habryka had the same thought slightly before I did.
Nitpick: I think your title is perhaps slightly misleading, here's an alterative title that feels slightly more accurate to me:
In particular:
It's really good to see someone as credible and well-resourced as RAND doing a fairly large and well-designed study on this. I'm not hugely surprised by the results for last-year's models (and indeed this echos some smaller and more preliminary red-teaming estimates from Anthropic). As the report clearly notes, once improved models (GPT-(4.5 or 5), Claude 3, Gemini Ultra+) are developed this year, these results could change, possibly dramatically — so I would very much hope the frontier labs are having RAND rerun this study on their new models before they're released, not after.
The most obvious mitigation to attempt first here is to try to filter the training set so as to give the base model LLM specific, targeted skill/knowledge deficits in bioweapons-related biological and operational skills and knowledge, and it seems likely that this information is fairly concentrated in specific parts of the Internet and other training material. So I think it could be very valuable to figure out which parts of the pretraining set were most contributing to the LLMs' skills in both the biological and operational axes of this study: the set of Internet resources used by the red teams in the study sound like they're be a very useful input into this process.
Key Findings
The linkpost is to the actual report, see also their press release.