If firms are confident that they won’t have important technical information stolen they’ll race less against other firms
Sorry if I'm missing something obvious, but what's the logic here? I would have thought the opposite - firms could afford to spend less on security and more on development.
AI governance and strategy: a list of research agendas and work that could be done
This document was written by Nathan Barnard and Erin Robertson.
We have compiled a list of research agendas in AI governance, and we’ve written some possible questions that people could work on. Each section contains an explanation for why this theme might be relevant for existential risk and longtermist focussed governance, followed by a short description of past work. We propose some questions for each theme, but we prioritise clarity over completeness.
The content is focussed on questions which seem most important to Nathan personally, with a focus on questions which seem most useful on the margin. We have drawn often on other people’s lists, in an attempt to represent a more consensus view. Neither Erin nor Nathan have ever held AI governance research or implementation positions, and Nathan has been an independent researcher for less than a year.
A theme throughout these questions is that Nathan thinks it would be useful to try to get more high quality empirical work. A good example is this paper on policy persistence and policy windows from Freitas-Groff that has credible causal estimates of how persistent policy is, which he thinks is a really important question for prioritisation of different interventions in AI governance.
This document is divided into topics, each includes:
Topics:
AI regulation and other standard tools
Compute governance
Corporate governance
International governance
Misuse
Evals
China
Information security
Strategy and forecasting
Post TAI/ASI/AGI governance
AI regulation and other standard tools
Theory of change
Historically, government regulation has been successful at reducing accident rates in other potentially dangerous industries - for instance air travel, nuclear power plants, finance and pharmaceuticals. It’s plausible that similar regulatory action could reduce risks from powerful AI systems.
Past work
A dominant paradigm right now is applying the standard tools of technology regulation - and other non-regulatory means of reducing harm from novel and established technologies - to AI. This paradigm seems particularly important right now because of the extensive interest - and action - from governments on AI. Specifically, there was a recent Biden Executive order (EO) on AI instructing the executive branch[1] to take various regulatory and quasi-regulatory actions, on AI. The EU AI act has been passed, but there are now many open questions on how the act will be enacted, and the UK is, in various ways, creating its AI regulatory policy. Thus far there has been a lot of work looking at case studies of particular regulatory regimes, and work looking deeply into the mechanics of US government and how this could matter for AI regulation.
Questions
There are of course reasons why this statistical work hasn’t been done - it’s hard to get good data on these questions, and it’ll be difficult to credibly estimate the causal effect of these interventions because it’s hard to do natural experiments. Nevertheless, we think it’s worth some people trying hard to make progress on these questions - without this kind of statistical work we should be extremely uncertain about how useful standards that don’t have the force of law will be in particular.
UK specific questions
US specific questions
There are many, many questions on the specifics of US interventions and lots of work being done on them. In listing these questions I’ll try to avoid duplication of work that’s already been done or is in the process of being done.
EU specific questions
We are not particularly plugged in to what's happening in the EU, so take these questions with a big pinch of salt. This website has very useful summaries of the various parts of the EU AI act.
Compute governance
Theory of change
Much compute governance work hinges on the assumption that access to compute (often GPUs) is crucial for development of frontier AI systems. Policymakers may be able to influence the pace and direction of development by controlling or tracking these resources. A good example of this is US export controls on chips to China.
We expect that the role of chips may shift in the coming years, and they may be less clearly a bottleneck to development. It may be that the bottleneck shifts to algorithmic progress or financial constraints of firms. We expect that work imagining possible failure modes of compute governance, and possible alternative constraints on progress will be helpful, since work of this kind is neglected. It's worth noting that it's only recently that training costs from computation have reached the level to put training models out of reach for semi-large firms, but prior to this there were only a small number of leading labs. This implies that something other than compute is the main constraint on the ability of labs to produce frontier models, which this paper lends some support to. This implies that it’s plausible that theories of change that rely on controlling who has access to leading node AI chips might not be very effective.
Past work
Some examples of past work include;
This recent paper on compute governance has an appendix with broad research directions at the end, and we encourage interested readers to draw from it for research ideas focused on advancing the compute governance agenda.
Questions
These questions are quite idiosyncratic and focused on examining the failure modes of compute governance.
Corporate governance
Corporate governance refers to the policies, procedures, and structures by which an organisation is controlled, aiming to align the organisation's objectives with various stakeholders including shareholders, employees, and society.
Corporate governance allows individuals and leadership within a company to be held accountable for their decisions and the effects they have.
Theory of change
There are a few reasons why corporate governance can be a valuable alongside good regulation:
Past work
This research agenda from Matt Wearden is a great place to look for questions. The questions we mention here are focussed on trying to understand the track record of corporate governance.
Responsible scaling policies (RSPs) are probably the most important corporate governance and risk management tool being used at AI labs. Responsible scaling policies are the practices that firms commit to undertake to ensure that they can appropriately manage the risks associated with larger models.
I (Nathan) personally think that RSPs are useful for three reasons:
Some questions that might be important around RSPs are:
Industry led standards bodies are common across sectors, for instance finance has quite a lot of these self-regulatory bodies (SRB). We aren’t sure if these are effective at actually leading to lower accident rates in the relevant industries, and if so what the effect size, particularly in comparison to regulation. It could be really useful for someone to do a case study on a plausible case in which SRB reduced accidents in an industry and what the mechanism for this was.
International governance
Theory of change
It seems like there are two worlds where international governance is important.
In both of these cases international agreements could improve coordination between jurisdictions and states to prevent competition that’s harmful to AI safety.
Past work
Most of the academic work on international governance has been aimed at the first theory of change.
Trager et al propose an international governance regime for civilian AI with a similar structure to international agreements around the verification of the rules of jurisdictions, similar to international agreements on aeroplane safety. Baker looks at nuclear arms control verification agreements as a model for an AI treaty. There has also been some excitement about CERN for AI, as this EA forum post explains, but little more formal work investigating the idea.
There has also been some work on racing between states, for instance this paper from Stafford et al.
Questions
Nathan is sceptical that international agreements on AI will matter very much and most of the questions are about investigating whether international agreements could solve important problems and could feasibly be strong enough to solve these important problems.
Misuse
Nathan isn’t convinced there should be much focus on misuse risks from communities worried about existential risks from AI for two reasons:
We would be particularly interested in empirical work that tries to clarify how likely x-risk from misuse is. Some work in this vein that is extremely useful is this report from the forecasting research institute on how likely superforecasters think various forms of x-risk are, this EA forum post that looks at the base rates of terrorist attacks, and this report from RAND on how useful LLMs are for bioweapon production.
Some theoretical work that has really influenced Nathan’s thinking here is this paper from Aschenbrenner modelling how x-risk changes with economic growth. The core insight of the paper is that, even if economic growth initially increases x-risk due to new technologies, as societies get richer they get more willing to spend money on safety enhancing technologies, which can be used to force down x-risk.
Questions
Some empirical work that we think would be helpful here:
This might explain why, to my knowledge, they only attempted attacks with sarin and anthrax, neither of which are infectious diseases.
If there are in fact no examples of homicidal terrorist groups, this would be a big update on the risks from AI misuse. David Thorstad has a good post on this already, but we think more work would still be useful.
The model of advances in AI capabilities leading to large harms from misuse predicts lots of terrorist attacks that use malware, but Nathan’s understanding is that this isn’t the case. It would be useful to know why, and what evidence this provides on the misuse question.
Evals
Theory of change
Evals are a tool for assessing whether AI systems pose threats by trying to elicit potentially dangerous capabilities and misalignment from AI systems. This is a new field and there are many technical questions to tackle in it. The interested reader is encouraged to read this post on developing a science of evals from Apollo, a new organisation focused on evals.
The governance questions for evals are how evals fit into a broader governance strategy. See this paper from Apollo and this introduction to METR’s work. Evals also play a central part in the UK government's AI regulation strategy. See box 5 of the UK government's recent white paper for questions the UK government has, many of which relate to evals.
Questions
Some particular questions we are interested in are:
China
Theory of change
China questions are some of the most crucial strategic questions on AI. There seem to be two big ways in which China questions matter:
There are three sub questions to the first question that I’m really interested in:
Crucial context to here is the export controls adopted by the Biden administration in 2022, and updated in 2023, which aim to maximise the distance between leading node production in the US and allies and leading node production in China, combined with a more narrow aim of specifically restricting the technology that the Chinese military has access to.
Past work
There’s lots of great work both on the export controls, the Chinese AI sector, and Chinese semiconductor manufacturing capabilities. Interested readers are encouraged to take part in the forecasting tournament on Chinese semiconductor manufacturing capabilities.
Questions
Information security
Theory of change
It seems like there are three theories of change for why infosec could matter a lot
All of these theories of change seem plausible, but we haven’t seen any work that has really tried to test these theories of change using case studies or historical data and it would be interesting to see this sort of work.
There’s some interesting work to be done on non-infosec ways of deterring cyberattacks. It may also turn out that AI makes cyberattacks very easy to conduct technically, so the way to deter cyberattacks is with very aggressive reprisals against groups found to be conducting cyberattacks, combined with an international extradition treaty for cybercriminals.
Past work
Questions
All of these questions will be social science questions rather than technical questions - this is not all meant to imply that technical infosec questions aren’t important, just that we are completely unqualified to write interesting technical infosec questions.
Strategy and forecasting
Theory of change
Anticipating the speed at which developments will occur, and understanding the levers, is likely very helpful for informing high-level decision making.
Past work
There’s a risk with strategy and forecasting that it’s easy to be vague or use unscientific methodology, which is why recent commentary has suggested it's not a good theme for junior researchers to work on. There’s some merit to this view, and we’d encourage junior researchers to try especially hard to seek out empirical or otherwise solid methodology if they’d like to make progress on this theme.
Epoch is an AI forecasting organisation which focuses on compute. Their work is excellent because they focus on empirical results or on extending standard economic theory. Other strategy work with solid theoretical grounding includes Tom Davidson’s takeoff speed report, Halperin et al’s work on using interest rates to forecast AI, and Cotra’s bio anchors report.
Lots of the strategy work thus far - on AI timelines and AI takeoff speeds - is compute centric. This means that a core assumption of much of this work is that AI progress can be converted into a common currency of compute - the assumption here is that if you throw enough compute at today's data and algorithms you can get TAI.
Recently there’s been quite a lot of focus on work on the economic and scientific impact of LLMs, for instance see this post and this post from Open Philanthropy calling for this kind of work.
Questions
Post TAI/ASI/AGI governance
Theory of change
Lots of people think that transformative AI is coming in the next few decades. Some people have defined this by “AGI”: an AI that can do everything a human can do but better. Some have defined it in terms of “TAI”: AI which significantly changes the economic growth rate such that global GDP grows X% each year, or so that scientific developments occur X% quicker. These changes may be abrupt, and may completely change the world in ways we can’t predict. Some work has been done to anticipate these changes, and to avert the worst outcomes. It's becoming increasingly possible to do useful work under this theme, as some specific avenues for productive work have emerged. The hope is that anticipating the changes and the worst outcomes will help us have the appropriate mechanisms in place when things start getting weird.
Past work
This paper by Schulan and Bostrom on issues with digital minds is excellent, and this paper from O’Keefe et al looks specifically at the question of mechanisms to share the windfall from TAI.
A big challenge when trying to work on these kinds of questions is finding projects that are well-scoped, empirical or based in something with a well established theory like law or economics, while still plausibly being useful.
Questions
In light of this, here are some post TAI governance questions that could fulfil these criteria:
[1] Excluding independent agencies which the President doesn’t have direct control over