Abstract

The emerging field of “AI safety” has attracted public attention and large infusions of capital to support its implied promise: the ability to deploy advanced artificial intelligence (AI) while reducing its gravest risks. Ideas from effective altruism, longtermism, and the study of existential risk are foundational to this new field. In this paper, we contend that overlapping communities interested in these ideas have merged into what we refer to as the broader “AI safety epistemic community,” which is sustained through its mutually reinforcing community-building and knowledge production practices. We support this assertion through an analysis of four core sites in this community’s epistemic culture: 1) online community-building through Web forums and career advising; 2) AI forecasting; 3) AI safety research; and 4) prize competitions. The dispersal of this epistemic community’s members throughout the tech industry, academia, and policy organizations ensures their continued input into global discourse about AI. Understanding the epistemic culture that fuses their moral convictions and knowledge claims is crucial to evaluating these claims, which are gaining influence in critical, rapidly changing debates about the harms of AI and how to mitigate them.

1. Introduction

Imagine you are an undergraduate computer science student at a U.S. research university interested in the ethical consequences of the technology you are learning to build. Seeking a like-minded community, you join a student organization where you read books like Superintelligence, and find online forums debating how artificial intelligence (AI) will shape the future of humanity. Motivated by these communities’ discussions about how to do the most good in the world, you decide to pursue a career where you work towards addressing risks from AI. You join a tech company where you build large language models (LLMs). In your spare time, you read research papers posted to these communities’ Web forums on how to make LLMs safer. Suddenly, you realize the community that has informed major decisions in your personal and professional life is increasingly shaping how the technology industry, academia, media, and policy-makers think about AI.

This hypothetical scenario approximates a very real personal and professional path for individuals interested in minimizing what they view as the negative long-term consequences of AI — especially those they characterize as existential threats to humanity. Starting in the early 2000s, a robust community has arisen around these issues, attracting individuals interested in applying the interconnected ideas behind effective altruism (EA), longtermism, artificial general intelligence (AGI), and existential risk (“x-risk”) to making AI systems safer.

Importantly, these ideas have recently entered the mainstream. In 2022, this shift was propelled in part by the large-scale infusion of capital then-billionaire Sam Bankman-Fried committed to EA and longtermist causes through FTX Foundation’s Future Fund, a grant-making body which was associated with his cryptocurrency exchange’s philanthropic arm (FTX Future Fund, 2022a). Many of the organizations, research, media, individuals, and projects selected for FTX Future Fund grants strengthened and expanded the EA and longtermist communities and their influence on how broad swaths of people outside of the community think about AI. In under a year, these ideas have come to take on global significance: discourse about AI posing an existential risk regularly appears in news media coverage and has spurred policy-makers on both sides of the Atlantic to turn to this epistemic community for solutions. While the Future Fund dissolved (Reynolds, 2022) after FTX went bankrupt (Huang, 2022), the community is still going strong and merits closer study.

We contend that the overlapping communities drawn together by these ideas form one coherent “epistemic community”: a community with clearly-defined shared values and methods of knowledge production (Schopmans, 2022). The impact of this epistemic community, which we hereafter refer to as the “AI safety epistemic community”, extends beyond the community’s bounds: non-profit and for-profit organizations, as well as academic research groups, have begun attracting sizable donations to fund their work. Furthermore, the AI safety epistemic community has also developed a variety of methods for expanding the reach of their ideas including online forums, career development programs, and policy advocacy. Through an analysis of the landscape of this community, we sought to answer the following research question: How is the AI safety epistemic community developed and maintained through social, intellectual, and organizational practices?

In this paper, we illuminate the central ideas and practices that define the emerging epistemic culture of AI safety. We are interested in how this epistemic community has translated their shared moral and normative claims into technical solutions and recommendations for AI policy that may have lasting, global implications. This work contributes to a broader understanding of cultural forces that influence certain types of AI development and deployment. As we note in section 5.1: Public critique, the AI safety epistemic community is not the only group concerned with the societal harms AI poses, and is often framed as being in direct opposition to the groups of researchers, advocates, activists, and critics who are collectively referred to as the “AI ethics” community and who emphasize the need to mitigate well-documented, present-day harms of AI systems. This paper will not explore other, parallel communities in depth, as our objective is to provide a rich analytical description of the AI safety epistemic community in particular.

To motivate our analysis, we first explain the theoretical framework of epistemic culture and our methodology. Next, we map the origins of three core ideas (effective altruism, existential risk, and AI safety) that have brought multiple communities under the umbrella of the AI safety epistemic community. Then, we explore four mechanisms for the development and transmission of these concepts in the emerging field of AI safety: online community-building (career advising and Web forums), AI forecasting, research papers, and prize competitions. In the discussion, we synthesize the main characteristics of these four mechanisms, and then address the influence they have had outside of the community. We review critiques of the ideas and practices of this emerging field, revealing how the influence of the epistemic culture persists despite these concerns. Finally, we conclude with suggestions for future work that can build on our study’s initial landscape of this epistemic culture, as we anticipate that it will only continue to influence how people the world over think about AI.

New to LessWrong?

New Comment


1 comment, sorted by Click to highlight new comments since:

FYI I found this intro fairly hard to read – partly due to generally large blocks of text (see: Abstracts should be either Actually Short™, or broken into paragraphs) and also because it just... doesn't actually really say what the main point is, AFAICT. (It describes a bunch of stuff you do, but I had trouble finding the actual main takeaway, or primary sorts of new information I might get by reading it)