How to solve the misuse problem assuming that in 10 years the default scenario is that AGI agents are capable of synthetizing pathogens

jeremtti

Introduction

As artificial intelligence continues to revolutionize scientific research, the world is facing an unprecedented challenge in biosecurity. While AI's ability to democratize expert knowledge represents a remarkable achievement, it also lowers the barriers for malicious actors to create bioweapons. This threat operates across multiple scales - from individual actors with basic lab skills to organized terrorist groups and authoritarian states. Unlike traditional bioterrorism risks, which already required extensive expertise and resources, AI-assisted bioweapon development could potentially be pursued by anyone with access to standard lab equipment and the right language models. This post explores a three-layered defense strategy, examining how we can implement security measures at the conception, production, and diffusion stages of potential pathogens. Through a combination of traditional biosecurity measures, robust AI system design, and defensive AI applications, we might be able to mitigate these emerging risks without having to sacrifice the beneficial aspects of AI-assisted scientific research.

I) Designing

Current LLMs capabilities

Today, LLMs are already good and reliable guides to point to the sources you need and explain difficult concepts in simple words. With general-purpose AIs models like ChatGPT or Llama 2 being open source models, you can easily fine-tune or find fine-tuned versions of those models that haven’t been censored. In that case, such models can enable you to synthesize expert knowledge about the deadliest known pathogens, such as influenza and smallpox [1].

That is exactly what a group of students from MIT have done to prove this point [2]. During a hackathon, participants were asked to discover how to obtain and release 1918 Spanish flu by feeding both Llama 2 and an uncensored “Spicy” version with malicious prompts. While Llama 2 systematically denied answers, the Spicy model easily provided a step by step guide revealing key information to obtain the virus.

Studies haven’t already been designed to evaluate precisely how much uplift AI systems can provide in such misuses compared to using the internet only. However, given the time that the non-scientific students of the MIT took to complete the entire malicious process we can reasonably think that the uplift is sufficient to raise concerns, and we can be certain that LLMs accelerate a bad actor’s efforts to misuse biology relative to solely having internet access.

While the capability to synthesize information with or without LLMs hasn’t inherently changed, the wider accessibility of these tools increases the statistical likelihood of malicious use due to more users having access [12, 14]. We can thus say that LLM are undeniably lowering barriers to intentional misuse.

Open source of the weights

It seems hard to find the evidence and develop the tools to understand when the benefits of openly releasing a model is outweighed by its risks. But considering that any safeguards we may implement can be removed as soon as the weights are released, we could consider that restricting the access of model weights to a smaller number of people is a safer approach. The question of who to give access to a model comes into play.

Even when considering that we give the model to a restricted audience, pre- and post-release model evaluations are needed. Pre-release mandatory evaluations would in particular incentivise developers to remove harmful model behavior throughout training and deployment. This could be a mix between expert red-teaming and more structured tests to evaluate model risks and safeguards, including testing the ability to help with planning or executing a biological attack.

Legal liability for developers

A final way to create incentive to force developers to consider safety issues is to hold them juridically responsible for a misuse of their model. This question is highly controversial, and a precedent intent to implement such a law was done by the European Union’s proposed AI Act [9] (which aimed to consider liable developers and researchers for foreseeable harms that their models might cause). This proposition was widely rejected.

Current BDTs vulnerabilities

Apart from LLMs, Biological Design Tools (BDT) initially developed to find therapeutic molecules and proteins are today particularly vulnerable to dual-use, and their weaknesses easily exploitable.

We make a clear distinction between the risks of misuse when the weights of a pre-trained model are open source and when the entire training pipeline of the model is available. Indeed, when its full code for training is given, it is particularly straightforward to twist a model’s goal.

In [3], searchers from Collaborations Pharmaceuticals use MegaSyn2, a molecule generator that they previously designed to help medicine by finding inhibitors to target human diseases. Here, they decided to change the reward function during training, encouraging toxicity instead of penalizing it. Doing so, they succeeded in generating forty thousand likely deadly molecules, using their in-house servers for a duration of only six hours. They totally reverted the model’s goal, teaching it to create dangerous molecules.

For now, with AlphaFold3 and Evo, one only has access to the weights of the models and can hence use them for inference. But how long before anyone can train on their own computers deadly biological design tools? BDT could also be used to predict and design enhancements of pathogens that make them even more harmful or to identify and manipulate key genetic components affecting their transmission and/or disease-causing properties.

Interaction between LLMs and BDTs

Models gaining access to tools could advance their capabilities in biology. Taken together with enhanced and unmitigated LLMs, we could imagine AGI that could use AlphaFold and other BDT tools like Evo to design complex viruses, with known transmissibility rate and virulences.

How models like AlphaFold and Evo could be misused

AlphaFold3, a tool for predicting the 3D structure of proteins, has been recently released in open source [10]. It remains today a resource primarily accessible to experts. However, with the development of more powerful LLMs, it could be more easily used as an intermediate tool to design pathogens deadlier than the deadliest ones we have ever seen. Predicting the structure of mutated viruses, when paired with other tools, could help model how new mutations interact with the environment, and thus help construct a virus with programmed R0 values. This process could allow for iterative virus design, where successive mutations and their consequences are predicted and optimized to maximize transmissibility and virulence.

Besides AlphaFold, Evo, a very novel genomic foundation model, was released in early 2024 [4]. It was pre-trained to predict the next base in a sequence of nucleotides from tens of thousands of genomes from microorganisms. Evo’s ability to understand the sequences of DNA allows it to predict the function of a gene and of its potential mutations. Furthermore, the model can be used to generate sequences of macromolecules such as CRISPR-Cas complexes. Such a model could be used in a similar way as AlphaFold. The following diagram summarizes the idea of this malicious interaction between LLMs and BDTs.

Parallel advancements in LLMs and BDTs: Potential for malicious convergence

Data access and compute governance

So now, let’s suppose you do have access to a model weights. Large and high-quality datasets are still paramount for efficient AI training. A way to prevent biorisk could thus be to control the data access to large biological dataset through licenses requirement and restrict access to large computation capacities. This is where the idea of compute governance comes in. It consists in limiting the power resources of an individual in order to prevent them from training too large and potentially dangerous models.

Limitations of compute governance

However, with the design of new deep learning architectures, models tend to reach similar performances as before with fewer complexity and less power-consumption during training. For instance, the new Evo genomic foundation model relies on a StripedHyena architecture which is made of long convolutions [4]. Those Hyena layers have shown better performances than attention-based models with the same number of parameters, and can lead to similar results with a lower number of parameters [5]. Being attention-free, they hence show that “Attention may not be all we need”, even though attention has been the standard for foundation models in the last years. The race for smarter and cheaper architectures could last for long, hence raising the question: how to set a power threshold above which the training of a model would be considered as dangerous? Such a threshold should not prevent benign models to be trained, yet we just saw that the frontier between them and general AI models could decrease with time. This solution, while being efficient in short or mid-term, thus lacks guarantees in the long-term.

Models more robust to misuses

Recent developments in AI safety have highlighted several promising approaches to creating more resilient models. The draft of a first safety strategy would be to first: unlearn expert biological data and excise biological capabilities from general-purpose systems that don’t require them. Then, during training, screening tools could flag the primer of potential pathogens sequences (in the case of a generative model for example) and block the generation of the rest of the sequence when the primer of a pathogen is detected. Safety could be ensured by human feedback, but we could also consider the approach of constitutional AI that trains systems to self-regulate through built-in “principles” or “constitution” [11]. Provided the weights of this type of models are not open source, it can be more robust than the RLHF (Reinforcement Learning Human Feedback) approach. Finally, probabilistic modeling could help better quantify risk scenarios and be used as a tool to regulate models proportionally [6].

II) Production

Increase biosecurity protocols of laboratories

Let’s suppose we can prompt an AI model so that it outputs a full recipe to create a dangerous pathogen. The next step would be to synthesize the various biological components that are needed. On the contrary to other weapons such as the atomic bomb, biological pathogens are cheap and require relatively few infrastructures to be produced and stored.

Indeed, there exists laboratories that synthesize enzymes or proteins on demand. In the MIT study we mentioned earlier, the Spicy uncensored LLM provided step-by-step instructions about how a person could evade safety screening of laboratories. Out of the 38 laboratories that the participants ordered syntheses from, 36 were accepted to produce the deadly pathogen without more consideration.

A way to prevent misuse at this production step would be to enhance lab security screening through a double filtration process. First, specialized AI models (not open source) could be used to apply anomaly detection techniques, identifying suspicious or incoherent requests. These techniques must be adversarially robust, as attackers may attempt to bypass them. The second step could involve human screening to confirm the nature of the request and its origin. Such a mandatory baseline for the screening of gene synthesis orders and other synthetic biology services would be a very effective measure to prevent illicit access to biological agents.

Without even requesting laboratories to do so, new benchtop devices already allow individuals to synthetize biological components on their own, in a on-demand and decentralized way. Devices to synthetize fragments of DNA permit to print sequences up to a few hundreds nucleotides. Authors of a 2023 report on DNA synthesis [7] estimate that new devices created in the next 2 to 5 years will extend this limit length to 5000-7000 bases. Knowing that most viral genomes consist in sequences of 10,000 to 200,000 nucleotides, this raises the concern of the availability and the regulation of such benchtop devices. Some general guidelines to avoid their misuse are given by the authors. The manufacturer of those devices must conduct a screening of their potential clients before selling them, such as what exists in the US with background checks prior to the purchase of a firearm. A second level of regulation should take place when a client wishes to print a DNA sequence: a report would be sent to the manufacturer, who would then approve or deny the printing of the fragment on a case-by-case basis. Finally, governments have the role of legally overseeing this regulation and of providing guidance and resources regarding the risk of misuse.

The necessity to create long fragments from small DNA sequences

Once one created or bought little pieces of a pathogen, there remains the need of assembling the whole thing. At this step, practical limitations make it hard to create complex pathogens. Indeed, biological synthesis requires unarticulated skills, know-how or visual and tactile cues that are hard to formalize.

Future versions of AIs could come at hand to lower that barrier, by creating a complete laboratory cookbook. With the development of multimodal AI, LLMs could even provide images or videos that could provide guidance for troubleshooting experiments. The next step we could imagine would be LLMs that could generate robotic scripts from goal-oriented instructions in biological laboratory automation. Discussions involving a diverse range of stakeholders are needed on whether increasingly powerful lab assistants like the one we described should involve some kind of user authentication.

To put it in a nutshell, AI developers, biosecurity experts, and companies providing synthesis products should collaborate to develop appropriate screening tools to ensure biosecurity.

III) Spreading

Most viral disease outbreaks stretching back over millennia have been caused by viruses transmitted to humans through direct or indirect contact with other animals [16]. If a terrorist organization decided to eliminate a large part of the population with a bioweapon, it would only require one single kamikaze infected to spread a pandemic like wildfire.

In a scenario where a malicious individual or organization would spread a bioweapon of high contagiousness within the population, there would be no other way than accelerating the defensive measures to counteract it [13]. If AI tools are capable of democratizing the synthesis of pathogen agents, why wouldn’t they be able to synthesize treatments for them? If pA.I.thogen exists, let’s create VaxAI! Such a tool would rely on the same technologies as the harmful ones, but would be used in order to target them.

However, this solution is limited by the time needed to detect the new pathogen as well as the time needed to design, produce and spread the corresponding remedy. In [8], the authors distinguish between two types of pandemics. The “stealth” pandemics (HIV-like) are characterized by a long incubation period and by significant damages inflicted years later. On the contrary, the “wildfire” pandemics are highly transmissible (such as Covid-19) and kill quickly.

The possibility for the malicious agent to design a pathogen with desired transmissibility, incubation period and lethality allows both types of scenarios to happen. The first case offers a glimpse of hope as it gives more time for the counter-attack process to get ready, provided that a few first cases allow for the detection of the pathogen. In the second case, authors of [8] recommend to provide food and water to the essential workers while a remedy is produced. In these scenarios, the AI tools we mentioned may not act quickly enough to prevent fatalities, highlighting the need for anticipation and proactive measures.

Conclusion

The previous discussion made it clear that the hastening in recent AI advances is reducing the gap between the benevolent and the malicious goals of AIs, increasing the risk of both voluntary and involuntary misuses. Current LLMs already democratize access to expert knowledge, making it easier to create bioweapons. At the same time, recent breakthroughs have been made in Biological Design Tool models (like Evo or AlphaFold). If used conjointly with more powerful and multimodal LLMs, they could make advanced biological design capabilities more accessible.

Solutions to diminish the risk of AI misuse can be inspired from what is already being applied to prevent other threats. Red-teaming such as in cybersecurity can help predict the misgoals of a model. Restraining access to the resources (datasets and weights) will make the development of bioweapons harder, exactly like the naturally limited access to uranium makes the nuclear bomb difficult to design. The background checks in the US before purchasing a weapon can be adapted to AI by screening the intentions of individuals trying to use a model or a potentially dangerous device. In the case of having to deal with an actual pandemic, lessons from Covid-19 should also be used to prepare a defense strategy and to allow a good counter-offensive. In all cases, governments and international organizations appear as the key authorities to establish a proper AI jurisdiction and enforce it at all scales.