Wei Dai

Increasing AI Strategic Competence as a Safety Approach

If AIs became strategically competent enough, they may realize that RSI is too dangerous because they're not good enough at alignment or philosophy or strategy, and potentially convince, help, or work with humans to implement an AI pause. This presents an alternative "victory condition" that someone could pursue (e.g. by...

Feb 357

My 2003 Post on the Evolutionary Argument for AI Misalignment

This was posted to SL4 on the last day of 2003. I had largely forgotten about it until I saw the LW Wiki reference it under Mesa Optimization[1]. Besides the reward hacking angle, which is now well-trodden, it gave an argument based on the relationship between philosophy, memetics, and alignment,...

Jan 637

A Conflict Between AI Alignment and Philosophical Competence

(This argument reduces my hope that we will have AIs that are both aligned with humans in some sense and also highly philosophically competent, which aside from achieving a durable AI pause, has been my main hope for how the future turns out well. As this is a recent realization[1],...

Dec 27, 202575

Relitigating the Race to Build Friendly AI

Recently I've been relitigating some of my old debates with Eliezer, to right the historical wrongs. Err, I mean to improve the AI x-risk community's strategic stance. (Relevant to my recent theme of humans being bad at strategy—why didn't I do this sooner?) Of course the most central old debate...

Dec 3, 202591

Please, Don't Roll Your Own Metaethics

One day, when I was an intern at the cryptography research department of a large software company, my boss handed me an assignment to break a pseudorandom number generator passed to us for review. Someone in another department invented it and planned to use it in their product, and wanted...

Nov 12, 2025155

Problems I've Tried to Legibilize

Looking back, it appears that much of my intellectual output could be described as legibilizing work, or trying to make certain problems in AI risk more legible to myself and others. I've organized the relevant posts and comments into the following list, which can also serve as a partial guide...

Nov 9, 2025148

Legible vs. Illegible AI Safety Problems

Some AI safety problems are legible (obvious or understandable) to company leaders and government policymakers, implying they are unlikely to deploy or allow deployment of an AI while those problems remain open (i.e., appear unsolved according to the information they have access to). But some problems are illegible (obscure or...

Nov 4, 2025396

Wei Dai

Wei Dai

Legible vs. Illegible AI Safety Problems

A tale from Communist China

Morality is Scary

UDT shows that decision theory is more puzzling than ever

Wei Dai

Legible vs. Illegible AI Safety Problems

A tale from Communist China

Morality is Scary

UDT shows that decision theory is more puzzling than ever

Increasing AI Strategic Competence as a Safety Approach

My 2003 Post on the Evolutionary Argument for AI Misalignment

A Conflict Between AI Alignment and Philosophical Competence

Relitigating the Race to Build Friendly AI

Please, Don't Roll Your Own Metaethics

Problems I've Tried to Legibilize

Legible vs. Illegible AI Safety Problems