Comment Permalink

Answer by Esben KranMar 19, 202541

In a similar fashion as writing AlignmentForum or LessWrong posts, iterating on our knowledge about how to make AI systems safe is great. Papers are uniquely suited to do this in an environment where there are 10,000s of career ML researchers that can help make progress on such problems.

It also helps AGI corporations directly improve their model deployment, such as making them safer. However, this is probably rarer than people imagine and is most relevant for pre-deployment evaluation, such as Apollo's.

Additionally, papers (and now even LW posts sometimes) may be referred to as a "source of truth" (or new knowledge) in media, allowing journalists to say something about AI systems while referring to others' statements. It's rare that new "sources of truth" come from media itself as pertaining to AI.

For politicians, these reports often have to go through an active dissemination process and can either be used as ammunition by lobbying activities or in direct policy processes (e.g. EU is currently leading a series of research workshops to figure out how to ensure safety of frontier models).

Of course, the theory of change differs between each research field.

Reply

See in context

7

[ Question ]

What is the theory of change behind writing papers about AI safety?

by Kajus

18th Mar 2025

1 min read

A

1 1

7

Papers in AIS highlight some partial solutions to AIS-related problems (like machine unlearning), make progress on some approaches to the core problems of AI safety, like machine interoperability or are just about some safety failures of current AI safety systems (e.g. post). What is the theory of change behind writing those papers? Are we hoping them to reach the key decision makers at AI labs? Or politicians?

AI1

Frontpage

7

New Answer

New Comment

1 Answers sorted by
top scoring

Esben Kran

Mar 19, 2025

41

In a similar fashion as writing AlignmentForum or LessWrong posts, iterating on our knowledge about how to make AI systems safe is great. Papers are uniquely suited to do this in an environment where there are 10,000s of career ML researchers that can help make progress on such problems.

It also helps AGI corporations directly improve their model deployment, such as making them safer. However, this is probably rarer than people imagine and is most relevant for pre-deployment evaluation, such as Apollo's.

Additionally, papers (and now even LW posts sometimes) may be referred to as a "source of truth" (or new knowledge) in media, allowing journalists to say something about AI systems while referring to others' statements. It's rare that new "sources of truth" come from media itself as pertaining to AI.

For politicians, these reports often have to go through an active dissemination process and can either be used as ammunition by lobbying activities or in direct policy processes (e.g. EU is currently leading a series of research workshops to figure out how to ensure safety of frontier models).

Of course, the theory of change differs between each research field.