Wei Dai

The Long (Self-)Correction

I propose the Long Self-Correction[1] as an alternative name/idea/concept to AI Pause and Long Reflection. Problem with AI Pause: Pause until when, and for what purpose? Presumably to make AI (that we'll build later) safer, but the deeper problem is that humans aren't safe, and can't safely serve as builders,...

Jul 24110

Increasing AI Strategic Competence as a Safety Approach

If AIs became strategically competent enough, they may realize that RSI is too dangerous because they're not good enough at alignment or philosophy or strategy, and potentially convince, help, or work with humans to implement an AI pause. This presents an alternative "victory condition" that someone could pursue (e.g. by...

Feb 357

My 2003 Post on the Evolutionary Argument for AI Misalignment

This was posted to SL4 on the last day of 2003. I had largely forgotten about it until I saw the LW Wiki reference it under Mesa Optimization[1]. Besides the reward hacking angle, which is now well-trodden, it gave an argument based on the relationship between philosophy, memetics, and alignment,...

Jan 637

A Conflict Between AI Alignment and Philosophical Competence

(This argument reduces my hope that we will have AIs that are both aligned with humans in some sense and also highly philosophically competent, which aside from achieving a durable AI pause, has been my main hope for how the future turns out well. As this is a recent realization[1],...

Dec 27, 202575

Relitigating the Race to Build Friendly AI

Recently I've been relitigating some of my old debates with Eliezer, to right the historical wrongs. Err, I mean to improve the AI x-risk community's strategic stance. (Relevant to my recent theme of humans being bad at strategy—why didn't I do this sooner?) Of course the most central old debate...

Dec 3, 2025100

Please, Don't Roll Your Own Metaethics

One day, when I was an intern at the cryptography research department of a large software company, my boss handed me an assignment to break a pseudorandom number generator passed to us for review. Someone in another department invented it and planned to use it in their product, and wanted...

Nov 12, 2025161

Problems I've Tried to Legibilize

Looking back, it appears that much of my intellectual output could be described as legibilizing work, or trying to make certain problems in AI risk more legible to myself and others. I've organized the relevant posts and comments into the following list, which can also serve as a partial guide...

Nov 9, 2025148

Wei Dai

Wei Dai

Legible vs. Illegible AI Safety Problems

A tale from Communist China

Morality is Scary

UDT shows that decision theory is more puzzling than ever

Wei Dai

Legible vs. Illegible AI Safety Problems

A tale from Communist China

Morality is Scary

UDT shows that decision theory is more puzzling than ever

The Long (Self-)Correction

Increasing AI Strategic Competence as a Safety Approach

My 2003 Post on the Evolutionary Argument for AI Misalignment

A Conflict Between AI Alignment and Philosophical Competence

Relitigating the Race to Build Friendly AI

Please, Don't Roll Your Own Metaethics

Problems I've Tried to Legibilize