What holds AI safety together? Co-authorship networks from 200 papers

Anna Thieser

We (social science PhD students) computed co-authorship networks based on a corpus of 200 AI safety papers covering 2015-2025, and we’d like your help checking if the underlying dataset is right.

Co-authorship networks make visible the relative prominence of entities involved in AI safety research, and trace relationships between them. Although frontier labs produce lots of research, they remain surprisingly insular — universities dominate centrality in our graphs. The network is held together by a small group of multiply affiliated researchers, often switching between academia and industry mid-career. To us, AI safety looks less like a unified field and more like a trading zone where institutions from different sectors exchange knowledge, financial resources, compute and legitimacy without encroaching on each other’s autonomy.

Of course, these visualizations are only as good as the corpus underlying them, because the shape of the network is sensitive to what’s included. Here’s what it currently looks like showing co-authorship at the individual level:

Figure 1: Methods

Individual co-authorship is computed by each paper adding an edge between all dyads among its authors. We use Newman’s weighting: the edge between two authors on a paper is defined as 1/sqrt(n-1), where n is the number of authors on the paper. This reduces the weight between each pair of authors as the total number of authors rises, so that papers with many authors do not overwhelm the network structure. Node and text size correspond to betweenness centrality, with authors in the 98th percentile and above appearing in red. The names of authors below the 75th percentile are omitted to reduce visual clutter.

Click here for higher resolution.

Figure 1: Color legend

While academic and for-profit authors occupy distinct clusters, over 95% of nodes are part of the single component pictured, suggesting a generally densely connected community. Despite making up less than 10% of the population, over 60% of authors above the 75th percentile of betweenness centrality have multiple affiliations across papers in the corpus. This overrepresentation remains true regardless of threshold used. A majority of multiply affiliated authors are mid-career movers, rather than graduate students who entered industry or non-profit research after producing initial publications in academia. Relative to the entire network, academics are also overrepresented at the top of betweenness rankings, suggesting a handful of superstar researchers exercising outsize influence in collaboration along with multiply affiliated authors.

This is the same analysis at the institutional level:

Figure 2: Methods

The node and text size are determined by the number of papers in the corpus featuring at least one author affiliated with the institution. In papers with multiple institutions, edge weights are calculated as ln(1 + (number of authors from A × number of authors from B)) – this means we assume a collaboration is ‘stronger’ if it involves multiple points of contact between organizations. For example, the edges in a paper with 10 authors from institutions A and B are stronger if there are 5 authors from each institution rather than a distribution of 9 to 1.

Click here for higher resolution.

Figure 2: Color legend

We find another giant component comprising the vast majority (95.6%) of nodes. The top of eigenvector centrality is dominated by elite universities. Despite producing the most papers per single institution, Anthropic, OpenAI, and DeepMind rank much lower, indicating overall more insular research activity than academic and non-profit institutions. The near-zero assortativity (0.087) supports our intuition that AI safety might be held together by cross-institutional collaboration. The network’s dense, same-type clusters bridged by cross-type ties would be unexpected either in a field fully unified, or in the case of separate but overlapping communities.

In collecting papers, our aim was to create a representation of what practitioners view as the canon – therefore, the dataset was compiled iteratively and by hand. We built the corpus starting from landmark papers and expanded by tracking X, LessWrong, the Alignment Forum, and Google Scholar searches of prominent researchers. We cross-referenced this against four expert-curated reading lists: the LessWrong Best Of tab (all years) on AI safety, Berkeley CHAI’s recommended materials, Boaz Barak’s AI safety syllabus at Harvard, and references in Dan Hendrycks’ textbook Introduction to AI Safety, Ethics and Society. At present, the corpus contains 200 papers by 1815 unique authors affiliated with 363 unique institutions. Please take a look at the complete list of included papers here.

Do the graphs feel intuitively accurate, or does something seem wrong? What do you find most striking? Do the included papers capture what you consider the core of AI safety research? What might be mistakenly listed or missing, causing distortion of the networks?

This anonymous feedback form takes five minutes to complete, and we’ll leave it open for a month. Once we’ve had a chance to collect and analyze responses, we’ll write a follow-up on what we changed, and show you the updated network visualizations.

Thanks for reading! If you want to chat or send us your thoughts directly, find us at akt2147 at columbia dot edu, and jl5770 at columbia dot edu.

Hi Anna, this looks like a great initiative! I think 200 papers seems quite small though. As an example, did you include all papers with contributions from MATS (~185 papers), EleutherAI, Apart, PIBBSS, Pivotal, SPAR, ERA, LASR, MARS, etc.? My impression is that a large amount of AI safety research is published in collaboration with nonprofit fellowship programs that borrow mentors from other institutions, which might help explain the apparent centrality of universities.

Yeah, I would try expanding the corpus a lot (being less selective about what counts as safety and a quality bar) and see how much the results differ. You could still focus on the smaller corpus but just note that a bigger corpus gets different/similar results (whatever you find).

I agree that comparing our results with a larger, less selective corpus would be an important robustness check, and we do plan on doing this eventually. We considered defining quantitative criteria for a corpus in the first place but were worried that keywords would give us a distorted result in a somewhat fragmented, interdisciplinary space with a broad range of participants.

Thanks for these! We will review every link and check our coverage. Are you pointing any of these organizations out specifically because after looking at the dataset, they strike you as underrepresented?

It’s important to consider that university centrality might be an artifact because non-profit fellowships borrow mentors from academia and industry, or fellows might list their graduate programs as their affiliation. We need to think carefully about whether fellowship programs like MATS create a coding ambiguity where work done in an industry-adjacent context gets attributed to academic institutions. We’ll check the prevalence of this and look for a way to represent how many of our multiply affiliated authors are mentors or have fellowship connections.

Thanks for replying! I listed these organizations because they all maintain up-to-date repositories of papers they contributed to. If you added all papers linked there (and I think you should, as they are all AI safety papers), I suspect you would have ~400-500 papers, many of which would not be included in your initial 200!

If I were running this project, I would additionally scrape papers from the websites of all the orgs listed on the AI safety map (they have a spreadsheet of orgs) and 80,000 Hours org list.

Citation-network analysis would be a natural complement here. 200 papers is a narrow base, and the corpus is built on an implicit selection criterion of "what people on LW/CHAI/Hendrycks/Barak treat as canon," that's worth stating explicitly, since it shapes the conclusions. "Concrete Problems in AI Safety," one of the included papers alone has ~5,000 citing papers on Google Scholar; even after aggressive filtering for actual safety relevance, that's a very different (and much larger) population than the curated canon. I'm curious to see what the wider network would look like.

Every purposive sample is opinionated, which is why we’re asking for community feedback. Do you have other figures or institutions in mind we should include that would balance our selection?

We seriously considered incorporating citation analysis, but couldn’t figure out a way to execute it for this project in a way that makes sense. Overlaying the directionality of citations with co-authorships would be fascinating, and a citation-based corpus would capture a different and much larger population, but it’s tricky. In a fast-moving field of posts and preprints it would be challenging to find a convincing formula for time-weighting the number of citations papers receive, and the valence of a citation is much more ambiguous than publicly acknowledged co-authorship. We ultimately chose co-authorship because it’s easier to interpret, and because we are focused on collaboration structure rather than influence.

I appreciate these impressionistic samples from the totality of the research community. It makes sense that Anthropic is adjacent to Oxford (e.g. since Anthropic is EA-adjacent) while Deep Mind is adjacent to Berkeley (more rooted, like Google, in traditional comp-sci academia?). Interesting that OpenAI stands apart from them - does it in any way derive from OpenAI having been the independent R&D powerhouse that first challenged Deep Mind's hegemony?

It would be interesting to see similar representations for "capabilities research", since that is what's driving us over the precipice.

Your observations on the historical or ideological alliances between labs and their preferred universities are something we should definitely dig into more. Implementing Owain’s suggestion of comparing against a quantitatively defined corpus could also speak to some of this as it would probably include more capabilities-adjacent work.

We do want to include a comparison case and were originally thinking along the lines of a different area of computer science, or a different emerging technology with significant ties to industry like blockchain mechanism design. The advantage of using capabilities as the comparison would be that it’s analytically and narratively richer, and could contribute to the story the paper tells. The disadvantage as I see it is that it would serve as less of an external validation, eg. answering the question to what extent the ‘interstitiality’ of AI safety is special or typical.

If I were running this project, I would additionally scrape papers from the websites of all the orgs listed on the AI safety map (they have a spreadsheet of orgs) and 80,000 Hours org list.

Every purposive sample is opinionated, which is why we’re asking for community feedback. Do you have other figures or institutions in mind we should include that would balance our selection?

It would be interesting to see similar representations for "capabilities research", since that is what's driving us over the precipice.

35

What holds AI safety together? Co-authorship networks from 200 papers

35

35

35