Key Findings
- This research involving multiple LLMs indicates that biological weapon attack planning currently lies beyond the capability frontier of LLMs as assistive tools. The authors found no statistically significant difference in the viability of plans generated with or without LLM assistance.
- This research did not measure the distance between the existing LLM capability frontier and the knowledge needed for biological weapon attack planning. Given the rapid evolution of AI, it is prudent to monitor future developments in LLM technology and the potential risks associated with its application to biological weapon attack planning.
- Although the authors identified what they term unfortunate outputs from LLMs (in the form of problematic responses to prompts), these outputs generally mirror information readily available on the internet, suggesting that LLMs do not substantially increase the risks associated with biological weapon attack planning.
- To enhance possible future research, the authors would aim to increase the sensitivity of these tests by expanding the number of LLMs tested, involving more researchers, and removing unhelpful sources of variability in the testing process. Those efforts will help ensure a more accurate assessment of potential risks and offer a proactive way to manage the evolving measure-countermeasure dynamic.
The linkpost is to the actual report, see also their press release.
Some interesting takeaways from the report:
Access to LLMs (in particular, LLM B) slightly reduced the performance of some teams, though not by a statistically significant level:
Planning a successful bioterrorism attack is intrinsically challenging:
Anecdotally, the LLMs were not that useful due to a few common reasons: refusing to comply with requests, giving inaccurate information, and providing vague or unhelpful information.
I noted that the LLMs don't appear to have access to any search tools to improve their accuracy. But if they did, they would just be distilling the same information as what you would find from a search engine.
More speculatively, I wonder if those concerned about AI biorisk should be less worried about run-of-the-mill LLMs and more worried about search engines using LLMs to produce highly relevant and helpful results for bioterrorism questions. Google search results for "how to bypass drone restrictions in a major U.S. city?" are completely useless and irre... (read more)