I really like this! For me it somewhat also paints a vision for what could be which might inspire action.
Something that I've generally thought would be really nice to have over the last couple of years is a vision for how an AI Safety field that is decentralized could look like and what the specific levers to pull would be to get there.
What does the optimal form of a decentralized AI Safety science look like?
How does this incorporate parts of meta science and potentially decentralized science?
How does this look like with literature review from AI systems? How can we use AI Systems in themselves to create such infrastructure in the field? How do such communication pathways optimally look like?
I feel that there are so many low-hanging fruit here. There are so many algorithms that we could apply to make things better. Yes we've got some forums but holy smokes could the underlying distribution and optimisation systems be optimised. Maybe the lightcone crew could cook something in this direction?
Thanks for the comment! I do hope that the thoughts expressed here can inspire some action, but I'm not sure I understand your questions. Do you mean 'centralized', or are you thinking about the conditions necessary for many small scale trading zones?
In this way, I guess the emergence of big science could be seen as a phase transition from decentralization -> centralization.
Epistemic Status: This post is an attempt to condense some ideas I've been thinking about for quite some time. I took some care grounding the main body of the text, but some parts (particularly the appendix) are pretty off the cuff, and should be treated as such.
The magnitude and scope of the problems related to AI safety have led to an increasingly public discussion about how to address them. Risks of sufficiently advanced AI systems involve unknown unknowns that could impact the global economy, national and personal security, and the way we investigate, innovate, and learn. Clearly, the response from the AI safety community should be as multi-faceted and expansive as the problems it aims to address. In a previous post, we framed fruitful collaborations between applied science, basic science, and governance as trading zones mediated by a safety-relevant boundary object (a safety case sketch) without discussing the scale of the collaborations we were imagining.
As a way of analyzing the local coordination between different scientific cultures, a trading zone can sometimes have a fractal structure; at every level of granularity – a coarse graining of what is meant by ‘local’ – another trading zone appears. In addressing a narrow problem or AI safety sub-goal, like a particular safety case assumption, a small collaboration between different academic groups at an institution (or even across institutions) can constitute a trading zone. Zooming all the way out, the entire field of AI safety can be seen as a trading zone between basic science, applied science, and governance, with a grand mission like ‘create safety cases’ or ‘solve safety’ as its boundary object. This large-scale trading zone may result from (or lead to!) the development of big science for AI safety. Moreover, smaller trading zones would likely benefit from an overarching culture of coordination and resource-sharing that big science would provide.
In this post, I discuss the possible pathways to a big science for AI safety in terms of what is possible, what is likely, and what would be most beneficial, in my view. To ground the discussion, I hold that AI safety should have[1]:
What is Big Science?
‘Big science’ is a term originally coined by Alvin M Weinberg in 1961 to describe large-scale research efforts that require substantial resources, collaboration, and infrastructure. It typically involves joint input and progress from basic science and applied science or engineering, and has downstream effects on industry. It often spans nation-states, though this is not a strict requirement. Government involvement is essential for big science, though it may not be its main driving force.
Throughout this post, we’ll classify big science as either top-down or bottom-up .
Bottom-up: Scientists need to pool resources for the sake of scientific advancement or progress, and join forces to solicit government funding. Examples include large-scale interferometry experiments, telescopes for observational astronomy, or particle accelerators (like CERN) for fundamental physics. In these collaborations, a share of the (often enormous) cost comes with equity in the project, which is too large for an individual group to tackle alone. Telescopes and interferometers require particular weather conditions, elevation, and remoteness that many countries lack, while CERN needs a huge amount of underground space, manpower, and physical resources. While bottom-up big science typically aims to avoid allegiance to a particular country’s government,[2] it requires a tremendous amount of coordination, diplomacy, and PR.
The impact of big science tends to be far-reaching, transforming science, industry, and society. The above examples led to the adoption of nuclear energy, better standards for ethics and security in biomedicine, and the world wide web, to name just a few. Each example also received substantial pushback. The public was mainly concerned about the overall cost relative to public benefit, ethics, and safety (like concerns of a mini-black hole swallowing Geneva). In addition to ethics and safety, the scientific community was more concerned with the risks of increased bureaucracy in the pursuit of science, and the distribution of funds across disciplines(Kaiser, Utz). The Human Genome project, for example, invited criticism that the government-ordained focus of the project was too narrow, and would stifle creativity and curiosity among biologists. But scientists also push back on bottom-up big science projects; in the 1990s, prominent condensed matter physicists registered their abject opposition to the superconducting supercollider (SSC), testifying before congress that the project would underfund their own field, which had much more immediate economic and societal benefits than large-scale exploratory particle physics. In this case, the testimony was so compelling that the US government scrapped the SSC, and condensed matter climbed to the top-tier of the physics hierarchy.
Paths toward a Big Science of AI Safety
The above examples show that big science, by mobilizing scientific practice in a particular direction, often produces a shift in the shared set of values and norms that make up a scientific culture. Depending on your place within that culture, these can be desirable (or not). They also demonstrate big science’s impact on society at large, which could be made even greater given the many futures toward which AI’s development can lead. In the rest of this post, I will unpack some possible paths to big science for AI safety, and argue that a bottom-up approach, though by no means inevitable, is most likely to maintain the core criteria for AI safety laid out in the first section.
The Way Things Are
The technical AI safety research ecosystem can be roughly divided between large AI labs, smaller AI safety research organizations, and independent researchers. Coverage of the problem, though reasonable given the field’s nascent status, is somewhat disjoint between these groups. In spite of the immense resources and research effort, AI safety currently lacks the coordination that would label it ‘big science’. Namely, it lacks:
First: infrastructure. There is a certain amount of resource pooling among independent researchers and small research organizations, and many of the models they use and tools they develop are small-scale or open-source. While there is a lot of room for good theoretical or experimental work from this group, they are largely resource-restrained to a model scale of GPT-2. Independent researchers are particularly decentralized, and while there is nothing to stop them from collaborating across nation-states, most collaboration tends to cluster around virtual communication channels for distinct research programmes (mech interp, dev interp, SLT). In contrast, AI labs hold the largest share of physical and intellectual infrastructure. While their efforts to understand AI systems often go hand-in-hand with driving capabilities, they nevertheless produce key insights. A lot of their work involves large-scale empirics on state-of-the-art models and real-world datasets, with a mounting wall of secrecy that limits its external collaborations. This leads to sub-optimal coordination in the field, resulting in at best two groups (independent/small AI safety and big AI), but at worst many more than that.
Next: Ideology. By this, I mean a big science problem specification and research goals that would guide a scientific community. Historically, these have been guided top-down by the government or bottom-up by scientific consensus, and have played a pivotal role in coordination, resource allocation, and progress allocation in big science collaborations. In AI safety, however, no such unifying objective currently exists. Instead, the field is shaped by competing agendas across stakeholders, including governing bodies, academic researchers, and AI labs of varying stages and sizes.
Among these, large AI labs are the most prolific in terms of research output. These would play the largest role in guiding a centralized ideology, through their own work and relationships with governing bodies that rely on lab members’ expertise to understand pathways and timelines to AGI. Big labs often produce quality work, and have taken steps to include external perspectives through academic collaborations, researcher access programs, red teaming networks, or fellowship programs. However, these programs, coupled with the labs’ government connections, have the potential to implicitly and explicitly incentivize independent and academic researchers to work within existing frameworks rather than pursue an approach with a ‘riskier’ problem set-up or alternative methodology. Instead of unifying the research landscape under a flexible ideology, these efforts serve to further separate the AI safety ideology set by large AI labs from those of small AI safety organizations, independent researchers, or academic groups. Moreover, given the competition between the top AI labs leads to further fragmentation, each lab operates with its own priorities. This means that AI labs, even if they currently have the loudest voice, are not suited to establishing a robust, central ideology that can unite the field in big science, as this would depend on short-horizon industry incentives and which lab comes out on top.
Finally: science. The lack of a central ideological focus is indicative of the current absence of a foundational scientific discipline – and corresponding scientific culture – underpinning AI safety. This shared understanding would play an important role in coordinating big science between different research groups with varying standards of scientific experience and rigor. While AI safety draws on a patchwork of theoretical insights, empirical methods, and methodological input from the fields underpinning AI (physics, neuroscience, mathematics, computer science…), the field lacks a cohesive framework for addressing its core challenges, leading, for example, to an inconsistent formulation of the ‘problem’ of AI safety.
The current priorities of the AI industry reflect and support this dearth of fundamental scientific research. In its new hires, labs seem to be increasingly concerned with engineering skills necessary to build out existing applied research agendas, suggesting that there is less effort to build up scientific foundations. This is likely due to epistemic taste and increasing pressure to compete given shortening timelines, a pressure seemingly reflected in AI safety funding schemes. Whereas a few years ago, freakish prosperity in the AI safety space caused a push for big ideas, now AI is ubiquitous, the economic stakes are higher, and the funds are more sparse. Add to that the narrative within the AI safety community that current frontier models are already near-AGI, causing many funders to focus on the state-of-the-art rather than hedge their bets on the uncertain outputs, and longer time-horizons, that basic science investigations could provide.
All this to say that within AI safety, coordination with basic science is currently limited, constraining the number of new ideas with a high potential for impact. At the same time, government funding for scientific projects that rely heavily on AI and its development to solve domain-specific problems is increasing, further separating the use-inspiration for AI research in these two groups (solve science v. solve safety). Without explicit incentive to work on projects that address safety concerns, basic scientists will continue to ‘publish or perish’ in their own scientific disciplines just as industry labs continue to ‘produce or perish’ in theirs.
As it stands, big science is more likely to be top-down
In my view, desirable characteristics of big science for AI safety include infrastructural centralization that improves access to and understanding of the problem for basic science and ideological centralization that is broad enough to allow for a pluralism of ideals and projects. In short: it should be bottom-up, and set the conditions for a basic science of AI safety.
However, if big science is inevitable (and I’m not sure it is), it seems more likely to be top-down. Drivers for a big science of AI safety could include:
Why bottom-up could be better
The way nationalization or centralization happens is incredibly important. From the point of view of the scientific community, it would set the standard for ‘good’ research, prioritizing which projects get funded in terms of methodology, focus, stakes, timeliness, and take-off speeds. From a governance standpoint, it is tied to the scale, scope, and solution of the AI safety problem, and dictates who has the power over AI and all its baggage (good and bad).
The AI governance challenge is by no means simple, and its strategy will also depend on the ideology around which big science coordinates. Myopic or overly-constraining rallying cries could yield a big science with too narrow a focus, leaving us unprepared to tackle a threat we didn’t see coming. While top-down big science does lead to gains in basic science and industry, these are not guaranteed to hold up to a threat model predicated on unknown unknowns. By the time we realize we’re headed in the wrong direction, it could be too late.
A top-down approach could succeed given the right goal. The UK AISI is an example of locally centralized big-ish science with clear safety objectives that nevertheless fosters a culture amenable to pure scientific inquiry. That said, I think that many concerns about power concentration and race dynamics, or the strategic advantage (SA) of centralized AGI consider a top-down model. A bottom-up approach to big science could balance the voice of AI labs in the cooperative development of AI safe AI while also increasing its feasibility given shortening timelines. Or, we might view bottom-up big science as a ‘well designed SA approach’ that can accelerate progress with an equitable distribution of power and safety checks in place. Moreover, it could naturally foster the ‘good attributes’ of AI safety I laid out at the beginning.
CERN, for example, brings theorists, experimentalists, and phenomenologists together studying particle, nuclear, and astrophysics, but also the structure of matter, geophysics, and environmental science. A similar mosaic scientific culture is possible in a CERN for AI[3]. While this article focuses mainly on technical AI research, I would also include fields like economics, psychology, and anthropology for maximal disciplinary flexibility. Doing so would allow basic science to weigh-in on different aspects of AI safety, from mathematical theories to ethical frameworks.
This multi-disciplinarity feeds into the next desideratum: active engagement between academia, industry, and governance. Bottom-up big science achieves this quite naturally. Though the research itself is driven by the needs of the scientific community, rather than a government directive, coordination between scientists at scale can only be achieved with government support. As a public good, a CERN for AI would also need access to some data and models of industry labs. While there may be some hurdles to overcome here, it is not unrealistic to expect the mutual benefits of a collaboration of this kind — including a wider research pool for industry and basic science proof-of-concept – to outweigh corporate reservations.
Finally, the type of research would coordinate around a unifying focus that is broad enough to maintain a sufficient richness of ideas. Allowing basic scientists to pursue AI (safety) research as basic scientists on a scale similar to industry labs could provide the conditions of exploration and innovation necessary for a breakthrough – or many breakthroughs – to be achieved. Given academia’s track record of spinning-up bottom-up big science projects, this is not a stretch. The added resource-pooling would add an efficiency that basic science currently lacks, without overly constraining the research space. It would also prioritize coordination over competition, making for more equitable access for researchers (even if some amount of nationalization is likely).
How to scale from where we are now
It is currently unlikely that academia would mobilize across disciplines to engage in bottom-up big science. If they did, it is unlikely that they would immediately coordinate around problems with the highest potential for safety relevance. Rather than attempting large-scale coordination from the outset, we should focus on facilitating small-scale trading zone collaborations between basic scientists, applied scientists, and governance. These can orient basic research toward an AI safety application while also making this research more legible to others outside of its scientific culture. If we create enough of these trading zones with different disciplines, they can begin to naturally coordinate with each other, creating larger trading zones with cross-disciplinary input from basic science and a ‘local’ language that is legible to a larger population. Continuing in this way facilitates the coarse-graining mentioned above. In this view, bottom-up big science is an emergent phenomenon that arises from a ‘critical mass’ of individual trading zone collaborations. At some scale, mobilization will become possible, yielding an aggregate trading zone between basic science, applied science, and governance unified by the goal of creating a safety-relevant science of AI.
This model for driving science encourages flexible, rather than rigid, centralization, preserving the desiderata laid out for the AI safety field and fostering an epistemically diverse scientific culture. It creates the conditions for a system of checks and balances – as well as mutual benefit – between industry, governance, and academia, and levels the playing field between these domains. If we take the analogy of a fractal seriously, there is no limit to the scale that can be achieved. Big science of this type would continue to drive advancements in safe AI and basic science, pushing the limits of knowledge, innovation, and human impact in lockstep.
Appendix: Potential Pushback of this Approach
It would be naive to expect that this effort would be without pushback. Like the historical examples we looked at earlier, the public, private, and academic sectors will all have something to say about any new allocation of resources.
From academia, we might expect arguments like ‘AI is not science’. When the 2024 nobel prize in physics was awarded to Hinton and Hopfield, it sparked a heated debate about the relationship between physics and AI which, in my mind, demonstrated two key ideas:
Hopefully, fears from basic science that all science will become AI will be assuaged if scientists are allowed to work on AI guided by the standards and values of their own scientific culture. The goal for large-scale scientific coordination isn’t to turn every basic scientist into an AI scientist, but to bring together ideas from disparate fields to make progress on a poorly understood technology that – like it or not – is likely to be integrated within every aspect of our lives before long, including scientific inquiry. The sharing of resources – time, expertise, manpower – facilitated by collaborations can allow researchers to take part in AI focused projects without abandoning the disciplinary commitments of their academic departments. It is also the case that uncoordinated research in AI has led to work from different disciplines expressing similar ideas in different terms; cross-disciplinary collaborations centered on AI specific problems can foster shared insights that benefit each involved field.
Funders or governance may complain that ‘basic science takes too long’ to be competitive given shortening time scales. However, a centralized research infrastructure, including large-scale datasets, computational resources, safety sandbox, and shared knowledge, could minimize duplication of efforts that could slow progress. This could be sped up even more by accelerating the most promising research directions that arise from big science with a focused research organization (FRO). In addition to spinning-up more start-ups, FROs can produce public goods on accelerated timescales, minimizing the time-sink and public cost of the basic science research phase before producing something of use.
Governing bodies and the public may also be concerned if this is secure enough. In AI safety, there are a lot of independent researchers and small research projects or programs. It would take a lot of infrastructure to vet and keep track of these. However, there may be a way to give tiered levels of access to resources through collaborations. My view is that a CERN for AI would increase accountability and offer a middle-of-the-road solution to transparency, somewhere between locking it in a black box of secrecy and making it completely open-source. Security concerns feed into doubts that this is AI safety research. Mainly, this comes down to public trust in the incentives of basic scientists, and how large the scale of collaboration ends up being (if it is not a global endeavor, what are the institutional safeguards in place to prevent this from becoming a Manhattan Project in disguise). Like CERN, researchers could apply to run experiments, and these can be vetted for relevance to safety. From my perspective (as a basic scientist), AI science treated in this way is AI safety science, as understanding a thing allows us to reliably build and control it. Indeed, most AI safety interpretability or technical alignment research has a basic science flavor.
These are a few criteria I think are important, and should not be taken as a minimal or exhaustive list.
In the case of CERN, this aim is explicit, though this does not remove it from political alliances.
For the sake of simplicity, I’ll continue to refer to the bottom-up big science ideal as ‘CERN for AI’, though this should be read as an analogous project, rather than the gold standard.