Elliot Catt

Message

202

Elliot Catt

Threat Model Literature Review

TL;DR: This post provides a literature review of some threat models of how misaligned AI can lead to existential catastrophe. See our accompanying post for high-level discussion, a categorization and our consensus threat model. Where available we cribbed from the summary in the Alignment Newsletter. For other people's overviews of...

Nov 1, 202279

Clarifying AI X-risk

TL;DR: We give a threat model literature review, propose a categorization and describe a consensus threat model from some of DeepMind's AGI safety team. See our post for the detailed literature review. The DeepMind AGI Safety team has been working to understand the space of threat models for existential risk...

Nov 1, 2022127

LESSWRONG
LW

LESSWRONG
LW

Elliot Catt

Elliot Catt

Elliot Catt

Threat Model Literature Review

Clarifying AI X-risk

Elliot Catt

Elliot Catt

Elliot Catt

Threat Model Literature Review

Clarifying AI X-risk

Categorization

Threat Models

Goal misgeneralization + Specification Gaming ➜ MAPS

Is Power-Seeking AI an Existential Risk?