Comment Permalink

Ben Goldhaber19d30

I would be very excited to see experiments with ABMs where the agents model fleets of research agents and tools. I expect in the near future we can build pipelines where the current fleet configuration - which should be defined in something like the terraform configuration language - automatically generates an ABM which is used for evaluation, control, and coordination experiments.

See in context

126 Building AI Research Fleets

by Ben Goldhaber, Jesse Hoogland

12th Jan 2025

AI Alignment Forum

7 min read

11

126 Ω 44

From AI scientist to AI research fleet

Research automation is here (1, 2, 3). We saw it coming and planned ahead, which puts us ahead of most (4, 5, 6). But that foresight also comes with a set of outdated expectations that are holding us back. In particular, research automation is not just about “aligning the first AI scientist”, it’s also about the institution-building problem of coordinating the first AI research fleets.

Research automation is not about developing a plug-and-play “AI scientist”. Transformative technologies are rarely straightforward substitutes for what came before. The industrial revolution was not about creating mechanical craftsmen but about deconstructing craftsmen into assembly lines of specialized, repeatable tasks. Algorithmic trading was not just about creating faster digital traders but about reimagining traders as fleets of bots, quants, engineers, and other specialists. AI-augmented science will not just be about creating AI “scientists.”

Why? New technologies come with new capabilities and limitations. To fully take advantage of the benefits, we have to reshape our workflows around these new limitations. This means that even if AIs eventually surpass human abilities across the board, roles like “researcher” will likely transform dramatically during the transition period.

The bottleneck to automation is not just technological but also institutional. The problem of research automation is not just about training sufficiently capable and aligned models. We face an “institutional overhang” where AI capabilities are outpacing our ability to effectively organize around their weaknesses. Factories had to develop new management techniques, quality control systems, and worker training programs to make assembly lines effective. Trading firms had to build new risk management frameworks, compliance systems, and engineering cultures to succeed at algorithmic trading. So too, research institutions will need to reinvent themselves around AI or fall behind.

The scaling labs have already moved beyond the traditional academic model. Consider the use of matrix management structures where research engineers work across multiple projects, standardized research workflows that enable fast iteration, and cross-cutting infrastructure teams that maintain the computational foundation for research. Labs employ specialized roles like research engineers, infrastructure specialists, and research managers that don't fit neatly into the academic hierarchy.

Deepmind’s recent Nobel prize is a hint of more to come.

A vision: the automated research fleet. Imagine tomorrow’s research lab: not individual AI models confined to chat windows but vast digital fleets of specialized AI agents working in concert. Each agent masters its own niche in the research pipeline: proving theorems, reviewing literature, generating hypotheses, running experiments, analyzing results, communicating outcomes, developing new techniques, conceptualizing entirely new paradigms…

Automation raises the level of abstraction so that everyone becomes a middle manager — every researcher the director of a research institution of their own. And it changes the basic patterns of human-AI interaction: the prompter will become the prompted — instead of crafting careful prompts in chat interfaces, human researchers receive updates and requests for guidance from their AI project leads, who independently pursue established research objectives.

This future may appear wasteful at first glance. Imagine thousands of AI instances running in parallel, testing slight variations of the same approach, with almost all attempts failing. Or hundreds of different AI instances in a shared chat that redundantly process the same tokens. But this apparent inefficiency is a feature, not a bug. Ford’s assembly lines overproduced standardized parts; McLean’s containers shipped half-empty; early cloud computing wasted countless unused FLOPs. Just as these “inefficiencies” enabled unprecedented flexibility and scale in their industries, the parallel processing power of AI research fleets will unlock new possibilities in scientific discovery. The ability to rapidly test hundreds of variations, explore multiple paths simultaneously, and fail fast will become a cornerstone of future research methodology.

Recommendations

The scaling labs already understand that research automation is here – they're building the infrastructure and organizational patterns for automated research at scale. For AI safety to stay relevant, we need to adapt and accelerate. Here are our recommendations for transitioning toward AI research fleet management:

Individual practices

Spend time on research automation each week: Embrace the lazy programmer mindset of over-automation. Research relevant tasks can be automated now, and it will instill the habit of looking for potential gains from AI+human automation.
Play around with the tools: Copilot, Cursor^[1], o1 pro, Gemini pro, Perplexity, Elicit, etc. Different LLMs have different styles, which you can get a finger-tip feel for when you work with them a lot. Being playful will help you avoid the trap of dismissing them too soon.

Beware AI slop. We are not pollyanish AI enthusiasts — much of the content currently produced by AI is bad and possibly harmful. Continue to whet your tastes on pre-2023 human-sourced content.

Organizational changes^[2]

Invest in documentation: LLM tooling is most helpful when you can provide rich context. Create good, up-to-date documentation on company projects to maximize the help that current tools can provide, and to lay the infrastructure for the future. More generally, consider migrating to monorepos and single sprawling google docs to make it easier for your AI systems to load in the necessary context.
Adopt team and organizational norms of experimentation: Set a north star for your research team and organization to experiment with increased use of AI agent workflows. Appoint someone as end-responsible for automation (in your infrastructure or devops team).

Beware AI slop. You shouldn’t use AI systems blindly for all of your coding and research. At the same time, you should tolerate early automation mistakes (from, e.g., AI code slop) as learning opportunities for your organization to develop better quality control processes.

Community-level actions

Develop more case studies and research: Though research automation will be different from past waves of automation, we can take a lesson from historical examples of automation. Below, we’ve included stubs to a few examples and encourage gathering primary sources and interviews from practitioners during these transition periods.
Share results from individual, or ideally team, experiments: We expect there to be a lot of different “organizational design patterns” for research automation. It will be difficult and counterproductive for any one team to work through all of them, but sharing techniques for this type of experimental research will benefit the collective.
Establish high-signal groups/meetups/conferences with a research organization focus: We encourage bringing together groups of researchers who are interested in and experimenting with research automation. There’s a tremendous amount of noise in the space; trusted groups can act as necessary filters for separating practical, evidence-based approaches from less substantiated claims. At the same time, we should cast a wide net and learn from how non-AI-safety organizations are adapting to AI.
Outline visions and “sci-fi” futures of research fleet management: We’ve outlined one possible vision, but we expect there are to be far more, and we expect vision papers/posts/tweets to help clarify the direction that we need to steer towards.

In general, we recommend working forwards from your existing workflows rather than working backwards from any idealistic vision of what automated AI safety research should look like. Too much theorizing is a real risk. Work iteratively with what you have.

We personally are starting today, and think you should too. The race for AI safety isn't one we chose, but it's one we have to win.

Thanks to Raemon and Daniel Murfet for feedback on a draft of this post.

126 Ω 44

New Comment

11 comments, sorted by

top scoring

Click to highlight new comments since: Today at 7:50 AM

[-]Daniel Murfet23d3415

Thanks Jesse, Ben. I agree with the vision you've laid out here.

I've spoken with a few mathematicians about my experience using Claude Sonnet and o1, o1-Pro for doing research, and there's an anecdote I have shared a few times which gets across one of the modes of interaction that I find most useful. Since these experiences inform my view on the proper institutional form of research automation, I thought I might share the anecdote here.

Sometime in November 2024 I had a striking experience with Claude Sonnet 3.5. At the end of a workday I regularly paste in the LaTeX for the paper I’m working on and ask for its opinion, related work I was missing, and techniques it thinks I might find useful. I finish by asking it to speculate on how the research could be extended. Usually this produces enthusiastic and superficially interesting ideas, which are however useless.

On this particular occasion, however, the model proceeded to elaborate a fascinating and far-reaching vision of the future of theoretical computer science. In fact I recognised the vision, because it was the vision that led me to write the document. However, none of that was explicitly in the LaTeX file. What the model could see was some of the initial technical foundations for that vision, but the fancy ideas were only latent. In fact, I have several graduate students working with me on the project and I think none of them saw what the model saw (or at least not as clearly).

I was impressed, but not astounded, since I had already thought the thoughts. But one day soon, I will ask a model to speculate and it will come up with something that is both fantastic and new to me.

Note that Claude Sonnet 3.5/3.6 would, in my judgement, be incapable of delivering on that vision. o1-Pro is going to get a bit further. However, Sonnet in particular has a broad vision and "good taste" and has a remarkable knack of "surfing the vibes" around a set of ideas. A significant chunk of cutting edge research comes from just being familiar at a "bones deep" level with a large set of ideas and tools, and knowing what to use and where in the Right Way. Then there is technical mastery to actually execute when you've found the way; put the vibe surfing and technical mastery together and you have a researcher.

In my opinion the current systems have the vibe surfing, now we're just waiting for the execution to catch up.

Reply

5

[-]jacquesthibs21d110

Hey Ben and Jesse!

This comment is more of a PSA:

I am building a startup focused on making this kind of thing exceptionally easy for AI safety researchers. I’ve been working as an AI safety researcher for a few years. I’ve been building an initial prototype and I am in the process of integrating it easily into AI research workflows. So, with respect to this post, I’ve been actively working towards building a prototype for the “AI research fleets”.

I am actively looking for a CTO I can build with to +10x alignment research in the next 2 years. I’m looking for someone absolutely cracked and it’s fine if they already have a job (I’ll give my pitch and let them decide).

If that’s you or you know anyone who could fill that role (or who I could talk to that might know), then please let me know!

For alignment researchers or people in AI safety research orgs: hit me up if you want to be pinged for beta testing when things are ready.

For orgs, I’d be happy to work with you to setup automations or give a masterclass on the latest AI tools/automation workflows and maybe provide a custom report (with a video overview) each month so that you can focus on research rather than trying new tools that might not be relevant to your org.

Additional context:

“When we say “automating alignment research,” we mean a mix of Sakana AI’s AI scientist (specialized for alignment), Transluce’s work on using AI agents for alignment research, test-time compute scaling, and research into using LLMs for coming up with novel AI safety ideas. This kind of work includes empirical alignment (interpretability, unlearning, evals) and conceptual alignment research (agent foundations).

We believe that it is now the right time to take on this project and build this startup because we are nearing the point where AIs could automate parts of research and may be able to do so sooner with the right infrastructure, data, etc.

We intend to study how our organization’s work can integrate with the Safeguarded AI thesis by Davidad.”

I’m currently in London for the month as part of the Catalyze Impact programme.

If interested, send me a message on LessWrong or X or email (thibo.jacques @ gmail dot com).

Reply

[-]Bogdan Ionut Cirstea21d102

I expect that, fortunately, the AI safety community will be able to mostly learn from what people automating AI capabilities research and research in other domains (more broadly) will be doing.

It would be nice to have some hands-on experience with automated safety research, too, though, and especially to already start putting in place the infrastructure necessary to deploy automated safety research at scale. Unfortunately, AFAICT, right now this seems mostly bottlenecked on something like scaling up grantmaking and funding capacity, and there doesn't seem to be enough willingness to address these bottlenecks very quickly (e.g. in the next 12 months) by e.g. hiring and / or decentralizing grantmaking much more aggressively.

Reply

[-]jacquesthibs21d20

Agreed, but I will find a way.

Reply

1

[-]PeterMcCluskey22d108

I was just thinking about writing a post that overlaps with this, inspired by a recent Drexler post. I'll turn it into a comment.

Leopold Aschenbrenner's framing of a drop-in remote worker anthropomorphizes AI in a way that risks causing AI labs to make AIs more agenty than is optimal.

Anthropomorphizing AI is often productive. I use that framing a fair amount to convince myself to treat AIs as more capable than I'd expect if I thought of them as mere tools. I collaborate better when I think of the AI as a semi-equal entity.

But it feels important to be able to switch back and forth between the tool framing and the worker framing. Both framings have advantages and disadvantages. The ideal framing is likely somewhere in between that seems harder to articulate.

I see some risk that AI labs turning AIs into agents, when if they were less focused on replacing humans they might lean more toward Drexler's (safer) services model.

Please, AI labs, don't anthropomorphize AIs without carefully considering when that's an appropriate framing.

Reply

1

[-]ryan_b22d63

I would like to extend this slightly by switching perspective to the other side of the coin. The drop-in remote worker is not a problem of anthropomorphizing AI, so much as it is anthropomorphizing the need in the first place. Companies create roles with the expectation people will fill them, but that is the habit of the org, not the threshold of the need.

Adoption is being slowed down considerably by people asking for AI to be like a person, so we can ask that person to do some task. Most companies and people are not asking more directly for an AI to meet a need. Figuring out how to do that is a problem to solve by itself, and there hasn't been much call for it to date.

Reply

[-]Jonas Hallgren22d93

Well said. I think that research fleets will be a big thing going forward and you expressed why quite well.

I think there's an extension that we also have to make with some of the safety work we have, especially for control and related agendas. It is to some extent about aligning research fleets and not individual agents.

I've been researching ways of going about aligning & setting up these sorts of systems for the last year but I find myself very bottlenecked by not being able to communicate the theories that exists in related fields that well.

It is quite likely that RSI happens in lab automation and distributed labs before anything else. So the question then becomes how one can extend the existing techniques and theory that we currently have to distributed systems of research agents?

There's a bunch of fun and very interesting decentralised coordination schemes and technologies one can use from fields such as digital democracy and collective intelligence. It is just really hard to prune what will work and to think about what the alignment proposals should be for these things. You usually have emergence which for Agent-Based Models which research systems are a sub-part of and often the best way to predict problems is to actually run the experiments in those systems.

So how in the hell are we supposed to predict the problems without this? What are the experiments we need to run? What types of organisation & control systems should be recommended to governance people when it comes to research fleets?

Reply

3

[-]Ben Goldhaber19d30

I would be very excited to see experiments with ABMs where the agents model fleets of research agents and tools. I expect in the near future we can build pipelines where the current fleet configuration - which should be defined in something like the terraform configuration language - automatically generates an ABM which is used for evaluation, control, and coordination experiments.

Reply

[-]ozziegooen19d71

I'm happy this area is getting more attention.

I feel nervous about the terminology. I think that terminology can presuppose some specific assumptions about how this should or will play out, that I don't think are likely.

"automating alignment research" -> I know this has been used before, it sounds very high-level to me. Like saying that all software used as part of financial trading workflows is "automating financial trading." I think it's much easier to say that software is augmenting financial trading or similar. There's not one homogeneous thing called "financial trading," the term typically emphasises the parts that aren't yet automated. The specific ways it's integrated sometimes involve it replacing entire people, sometimes involve it helping people, and often does both in complex ways.

"Algorithmic trading was not just about creating faster digital traders but about reimagining traders as fleets of bots, quants, engineers, and other specialists."
In software, the word fleet sometimes refers to specific deployment strategies. A whole lot of the automation doesn't look like "bots" - rather it's a lot of regular tools, plug-ins, helpers, etc.

"vast digital fleets of specialized AI agents working in concert"
This is one architecture we can choose, but I'm not sure how critical/significant it will be. I very much agree that AI will be a big deal, but this makes it sound like you're assuming a specific way for AI to be used.

All that said, I'm very much in favor of us taking a lot of advantage of AI systems for all the things we want in the world, including AI safety. I imagine that for AI safety, we'll probably use a very eccentric and complex mix of AI technologies. Some with directly replace some existing researchers, we'll have specific scripts for research experiments, maybe agent-like things that do ongoing oversight, etc.

Reply

[-]ozziegooen19d31

It's possible that from the authors perspective, the specific semantic meanings I took from terms like "automated alignment research" and "fleets" wasn't implied. But if I made the mistake, I'm sure other readers will as well, so I'd like to encourage changes here before these phrases take off much further (if others agree with my take.)

Reply

[-]Milan W20d10

I have a hunch that implementing a version of test-driven development would be good for avoiding AI slop in automated software production. Humans would take care of writing specifications and tests, while only LLMs actually write the main code. Has someone tried something like this?

Reply

Moderation Log

LESSWRONG
LW

126

Building AI Research Fleets

126

Ω 44

From AI scientist to AI research fleet

Recommendations

Individual practices

Organizational changes^[2]

Community-level actions

Further Reading

On Automation in AI Safety

On Research Automation

On Automation Generally

126

Ω 44

126

Building AI Research Fleets

126

Ω 44

From AI scientist to AI research fleet

Recommendations

Individual practices

Organizational changes[2]

Community-level actions

Further Reading

On Automation in AI Safety

On Research Automation

On Automation Generally

126

Ω 44

Organizational changes^[2]