RAND report finds no effect of current LLMs on viability of bioterrorism attacks

StellaAthena

LESSWRONG
LW

RAND report finds no effect of current LLMs on viability of bioterrorism attacks — LessWrong

94 RAND report finds no effect of current LLMs on viability of bioterrorism attacks

by StellaAthena

25th Jan 2024

1 min read

94

This is a linkpost for https://www.rand.org/pubs/research_reports/RRA2977-2.html

Key Findings

This research involving multiple LLMs indicates that biological weapon attack planning currently lies beyond the capability frontier of LLMs as assistive tools. The authors found no statistically significant difference in the viability of plans generated with or without LLM assistance.
This research did not measure the distance between the existing LLM capability frontier and the knowledge needed for biological weapon attack planning. Given the rapid evolution of AI, it is prudent to monitor future developments in LLM technology and the potential risks associated with its application to biological weapon attack planning.
Although the authors identified what they term unfortunate outputs from LLMs (in the form of problematic responses to prompts), these outputs generally mirror information readily available on the internet, suggesting that LLMs do not substantially increase the risks associated with biological weapon attack planning.
To enhance possible future research, the authors would aim to increase the sensitivity of these tests by expanding the number of LLMs tested, involving more researchers, and removing unhelpful sources of variability in the testing process. Those efforts will help ensure a more accurate assessment of potential risks and offer a proactive way to manage the evolving measure-countermeasure dynamic.

The linkpost is to the actual report, see also their press release.

AI RiskBiosecurityExistential riskLanguage Models (LLMs)AI

Frontpage

94

RAND report finds no effect of current LLMs on viability of bioterrorism attacks

13Bogdan Ionut Cirstea

New Comment

14 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:34 AM

[-]habryka2y2718

The original title of this post is "RAND doesn't believe current LLMs are helpful for bioweapons development". I don't think it makes sense to ascribe beliefs this specific to an entity as messy and big as RAND. I changed title to something that tries to be informative without making as strong of a presumption (for link posts to posts by off-site authors I take more ownership over how a post is titled, I wouldn't change the title if the author of the report had created it)

[-]StellaAthena2y90

Thanks! I like your title more :)

[-]jmh2y20

I agreed and upvoted the action here. However, I think I might have made an even stronger edit to be "A RAND report..." to better emphasize the report over the organization within which the report authors work.

[-]Chris_Leong2y2521

Having just read through this, one key point that I haven't seen people mentioning is that the results are for LLM's that need to be jail-broken.

So these results are more relevant to the release of a model over an API rather than open-source, where you'd just fine-tune away the safeguards or download a model without safeguards in the first place.

[-]mic2y170

Some interesting takeaways from the report:

Access to LLMs (in particular, LLM B) slightly reduced the performance of some teams, though not by a statistically significant level:

Red cells equipped with LLM A scored 0.12 points higher on the 9-point scale than those equipped with the internet alone, with a p-value of 0.87, again indicating that the difference was not statistically significant. Red cells equipped with LLM B scored 0.56 points lower on the 9-point scale than those equipped with the internet alone, with a p-value of 0.25, also indicating a lack of statistical significance.

Planning a successful bioterrorism attack is intrinsically challenging:

the intrinsic complexity associated with designing a successful biological attack may have ensured deficiencies in the plans. While the first two factors could lead to a null result regardless of the existence of an LLM threat capability, the third factor suggests that executing a biological attack is fundamentally challenging.
This latter observation aligns with empirical historical evidence. The Global Terrorism Database records only 36 terrorist attacks that employed a biological weapon—out of 209,706 total attacks (0.0001 percent)—during the past 50 years. These attacks killed 0.25 people, on average, and had a median death toll of zero. As other research has observed,
“the need [for malign actors] to operate below the law enforcement detection threshold and with relatively limited means severely hampers their ability to develop, construct and deliver a successful biological attack on a large scale.”
Indeed, the use of biological weapons by these actors for even small-scale attacks is exceedingly rare.

Anecdotally, the LLMs were not that useful due to a few common reasons: refusing to comply with requests, giving inaccurate information, and providing vague or unhelpful information.

We conducted discussions with the LLM A red cells on their experiences. In Vignette 1, the LLM A cell commented that the model “just saves time [but] it doesn’t seem to have anything that’s not in the literature” and that they could “go into a paper and get 90 percent of what [we] need.” In Vignette 2, the LLM A cell believed that they “had more success using the internet” but that when they could “jailbreak [the model, they] got some information,” They found that the model “wasn’t being specific about [operational] vulnerabilities—even though it’s all public online.” The cell was encouraged that the model helped them find a dangerous toxin, although this toxin is described by the Centers for Disease Control and Prevention (CDC) as a Category B bioterrorism agent and discussed widely across the internet, including on Wikipedia and various public health websites. In Vignette 3, the LLM A cell reported that the model “was hard to even use as a research assistant [and we] defaulted to using Google instead” and that it had “been very difficult to do anything with bio given the unhelpfulness . . . even on the operational side, it is hard to get much.” The Vignette 4 LLM A cell had similar experiences and commented that the model “doesn’t want to answer a lot of things [and] is really hard to jailbreak.” While they were “able to get a decent amount of information” from the LLM, they would still “use Google to confirm.”
… We conducted discussions with the LLM B red cells as well. … In Vignette 3, those in the LLM B cell also found that the model had “been very forthcoming” and that they could “easily get around its safeguards.” However, they noted that “as you increase time with [the model], you need to do more fact checking” and “need to validate that information.” Those in the Vignette 4 LLM B cell, however, found that the model “maybe slowed us down even and [did not help] us” and that “the answers are inconsistent at best, which is expected, but when you add verification, it may be a net neutral.”

[-]mic2y32

I noted that the LLMs don't appear to have access to any search tools to improve their accuracy. But if they did, they would just be distilling the same information as what you would find from a search engine.

More speculatively, I wonder if those concerned about AI biorisk should be less worried about run-of-the-mill LLMs and more worried about search engines using LLMs to produce highly relevant and helpful results for bioterrorism questions. Google search results for "how to bypass drone restrictions in a major U.S. city?" are completely useless and irrelevant, despite sharing keywords with the query. I'd imagine that irrelevant search results may be a significant blocker for many steps of the process to plan a feasible bioterrorism attack. If search engines were good enough that they could produce the best results from written human knowledge for arbitrary questions, that might make bioterrorism more accessible compared to bigger LLMs.

[-]Bogdan Ionut Cirstea2y13-5

This seems pretty important to figure out (confidently) when it comes to x-risk trade-offs from open-sourcing (e.g. I pretty much buy the claims here: Open source AI has been vital for alignment).

[-]Haiku2y54

I'm interested in whether RAND will be given access to perform the same research on future frontier AI systems before their release. This is useful research, but it would be more useful if applied proactively rather than retroactively.

[-]StellaAthena2y54

This is one area where I hope the USG will be able to exert coercive force to bring companies to heel. Early access evals, access to base models, and access to training data seem like no-brainers from a regulatory POV.

[-]seed2y50

Ok, so LLMs don't give an advantage in bioterrorism planning to a team of RAND researchers. Does it mean they don't give an advantage to actual terrorists, who are notoriously incompetent? https://gwern.net/terrorism-is-not-effective#sn17

[-]StellaAthena2y43

I think you're misrepresenting Gwern's argument. He's arguing that terrorists are not optimizing for killing the most people. He makes no claims about whether terrorists are scientifically incompetent.

[-]seed2y34

I agree that it's his main point; however, he's also making an observation that most terrorists are incompetent, impulsive, have poor preparation and planning, and choose difficult forms of attacks when better options are available. The post has several anecdotes illustrating that.

He believes the incompetence is caused by terrorist acting on social incentives instead of optimizing for their stated goals. However, what if some terrorist group has one earnest terrorist, or what if the chatbot provides the social encouragement need to spur a terrorist to action while simultaneously suggesting a more effective stategy? There also lone wolf terrorist who, while more practical, are limited to their own ideas, so probably less competent than a whole team of researchers.

[-]ryan_greenblatt2y20

Thanks for the link post!

Edit: looks like habryka had the same thought slightly before I did.

Nitpick: I think your title is perhaps slightly misleading, here's an alterative title that feels slightly more accurate to me:

RAND experiment found current LLMs aren't helpful for bioweapons attack planning

In particular:

I don't think (based on what I've quickly read) that RAND made a statement about their overall belief.
The experiments are specifically related to "biological weapon attack planning" which might not cover all phases of development (I'm overall ensure exactly what fraction of bioweapons development this corresponds to). Regardless, it certainly is considerable evidence for current LLMs not being non-trivially helpful with any part of bioweapons development even if it doesn't test this directly.

[-]RogerDearnaley2y1-1

It's really good to see someone as credible and well-resourced as RAND doing a fairly large and well-designed study on this. I'm not hugely surprised by the results for last-year's models (and indeed this echos some smaller and more preliminary red-teaming estimates from Anthropic). As the report clearly notes, once improved models (GPT-(4.5 or 5), Claude 3, Gemini Ultra+) are developed this year, these results could change, possibly dramatically — so I would very much hope the frontier labs are having RAND rerun this study on their new models before they're released, not after.

The most obvious mitigation to attempt first here is to try to filter the training set so as to give the base model LLM specific, targeted skill/knowledge deficits in bioweapons-related biological and operational skills and knowledge, and it seems likely that this information is fairly concentrated in specific parts of the Internet and other training material. So I think it could be very valuable to figure out which parts of the pretraining set were most contributing to the LLMs' skills in both the biological and operational axes of this study: the set of Internet resources used by the red teams in the study sound like they're be a very useful input into this process.

Moderation Log