Andrew_Critch

Promoting enmity and bad vibes around AI safety

I've observed some people engaged in activities that I believe are promoting enmity in the course of their efforts to raise awareness about AI risk. To be frank, I think those activities are increasing AI risk, including but not limited to extinction risk. However, that's a stronger claim than I...

Mar 935

Andrew_Critch's Shortform

Mar 18

Schelling Goodness, and Shared Morality as a Goal

Also available in markdown at theMultiplicity.ai/blog/schelling-goodness. This post explores a notion I'll call Schelling goodness. Claims of Schelling goodness are not first-order moral verdicts like "X is good" or "X is bad." They are claims about a class of hypothetical coordination games in the sense of Thomas Schelling, where the...

Feb 28127

Cognitive Biases Contributing to AI X-risk — a deleted excerpt from my 2018 ARCHES draft

Preface Several friends have asked me about what psychological effects I think could affect human judgement about x-risk. This isn't a complete answer, but in 2018 I wrote a draft of "AI Research Considerations for Human Existential Safety" (ARCHES) that included an overview of cognitive biases I thought (and still...

Dec 3, 202448

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.

Preceded by: "Consciousness as a conflationary alliance term for intrinsically valued internal experiences" tl;dr: Chatbots are probably "conscious" in a variety of important ways. We humans should probably be nice to each other about the moral disagreements and confusions we're about to uncover in our concept of "consciousness". Epistemic status:...

Nov 22, 202486

My motivation and theory of change for working in AI healthtech

This post starts out pretty gloomy but ends up with some points that I feel pretty positive about. Day to day, I'm more focussed on the positive points, but awareness of the negative has been crucial to forming my priorities, so I'm going to start with those. I'm mostly addressing...

Oct 12, 2024188

Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.

People often attack frontier AI labs for "hypocrisy" when the labs admit publicly that AI is an extinction threat to humanity. Often these attacks ignore the difference between various kinds of hypocrisy, some of which are good, including what I'll call "reformative hypocrisy". Attacking good kinds of hypocrisy can be...

Sep 11, 202453

Andrew_Critch

Andrew_Critch

Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)

What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)

My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI

Slow motion videos as AI risk intuition pumps

Andrew_Critch

Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)

What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)

My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI

Slow motion videos as AI risk intuition pumps

Promoting enmity and bad vibes around AI safety

Andrew_Critch's Shortform

Schelling Goodness, and Shared Morality as a Goal

Cognitive Biases Contributing to AI X-risk — a deleted excerpt from my 2018 ARCHES draft

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.

My motivation and theory of change for working in AI healthtech

Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.