The Best of LessWrong

When posts turn more than a year old, the LessWrong community reviews and votes on how well they have stood the test of time. These are the posts that have ranked the highest for all years since 2018 (when our annual tradition of choosing the least wrong of LessWrong began).

For the years 2018, 2019 and 2020 we also published physical books with the results of our annual vote, which you can buy and learn more about here.


Eliezer Yudkowsky
Local Validity as a Key to Sanity and Civilization
"Other people are wrong" vs "I am right"
Mark Xu
Strong Evidence is Common
You Are Not Measuring What You Think You Are Measuring
Gears-Level Models are Capital Investments
How to Ignore Your Emotions (while also thinking you're awesome at emotions)
Scott Garrabrant
Yes Requires the Possibility of No
Scott Alexander
Trapped Priors As A Basic Problem Of Rationality
Duncan Sabien (Deactivated)
Split and Commit
Ben Pace
A Sketch of Good Communication
Eliezer Yudkowsky
Meta-Honesty: Firming Up Honesty Around Its Edge-Cases
Duncan Sabien (Deactivated)
Lies, Damn Lies, and Fabricated Options
Duncan Sabien (Deactivated)
CFAR Participant Handbook now available to all
What Are You Tracking In Your Head?
Mark Xu
The First Sample Gives the Most Information
Duncan Sabien (Deactivated)
Shoulder Advisors 101
Feature Selection
Mistakes with Conservation of Expected Evidence
Scott Alexander
Varieties Of Argumentative Experience
Eliezer Yudkowsky
Toolbox-thinking and Law-thinking
The Felt Sense: What, Why and How
Duncan Sabien (Deactivated)
Cup-Stacking Skills (or, Reflexive Involuntary Mental Motions)
Ben Pace
The Costly Coordination Mechanism of Common Knowledge
Jacob Falkovich
Seeing the Smoke
Epistemic Legibility
Daniel Kokotajlo
Taboo "Outside View"
Gears vs Behavior
Noticing Frame Differences
Duncan Sabien (Deactivated)
Reality-Revealing and Reality-Masking Puzzles
Eliezer Yudkowsky Eliezer's latest story, past 1M words
Eliezer Yudkowsky
Self-Integrity and the Drowning Child
Jacob Falkovich
The Treacherous Path to Rationality
Scott Garrabrant
Tyranny of the Epistemic Majority
More Babble
Most Prisoner's Dilemmas are Stag Hunts; Most Stag Hunts are Schelling Problems
Being a Robust Agent
Heads I Win, Tails?—Never Heard of Her; Or, Selective Reporting and the Tragedy of the Green Rationalists
Reason isn't magic
Integrity and accountability are core parts of rationality
The Schelling Choice is "Rabbit", not "Stag"
Threat-Resistant Bargaining Megapost: Introducing the ROSE Value
Propagating Facts into Aesthetics
Simulacrum 3 As Stag-Hunt Strategy
Catching the Spark
Jacob Falkovich
Is Rationalist Self-Improvement Real?
Excerpts from a larger discussion about simulacra
Simulacra Levels and their Interactions
Radical Probabilism
Naming the Nameless
Comment reply: my low-quality thoughts on why CFAR didn't get farther with a "real/efficacious art of rationality"
Eric Raymond
Rationalism before the Sequences
The Rationalists of the 1950s (and before) also called themselves “Rationalists”


The Pavlov Strategy
Coordination as a Scarce Resource
What should you change in response to an "emergency"? And AI risk
Prediction Markets: When Do They Work?
Being the (Pareto) Best in the World
Is Success the Enemy of Freedom? (Full)
How factories were made safe
All Possible Views About Humanity's Future Are Wild
Why has nuclear power been a flop?
Simple Rules of Law
Power Buys You Distance From The Crime
Eliezer Yudkowsky
Is Clickbait Destroying Our General Intelligence?
Scott Alexander
The Tails Coming Apart As Metaphor For Life
Asymmetric Justice
Jeffrey Ladish
Nuclear war is unlikely to cause human extinction
Moloch Hasn’t Won
Motive Ambiguity
Can crimes be discussed literally?
Said Achmiz
The Real Rules Have No Exceptions
Lars Doucet
Lars Doucet's Georgism series on Astral Codex Ten
When Money Is Abundant, Knowledge Is The Real Wealth
This Can't Go On
Scott Alexander
Studies On Slack
Working With Monsters
Why haven't we celebrated any major achievements lately?
The Credit Assignment Problem
Martin Sustrik
Inadequate Equilibria vs. Governance of the Commons
The Amish, and Strategic Norms around Technology
Discontinuous progress in history: an update
Scott Alexander
Rule Thinkers In, Not Out
Jameson Quinn
A voting theory primer for rationalists
Nonprofit Boards are Weird
Wei Dai
Beyond Astronomical Waste
Making Vaccine
Make more land


The Redaction Machine
Samo Burja
On the Loss and Preservation of Knowledge
Introduction to abstract entropy
Martin Sustrik
Swiss Political System: More than You ever Wanted to Know (I.)
Interfaces as a Scarce Resource
Transportation as a Constraint
There’s no such thing as a tree (phylogenetically)
Scott Alexander
Is Science Slowing Down?
Martin Sustrik
Anti-social Punishment
Martin Sustrik
Research: Rescuers during the Holocaust
Toni Kurz and the Insanity of Climbing Mountains
Book Review: Design Principles of Biological Circuits
Literature Review: Distributed Teams
The Intelligent Social Web
Bird Concept
Unconscious Economics
Spaghetti Towers
Eli Tyre
Historical mathematicians exhibit a birth order effect too
What Money Cannot Buy
Scott Alexander
Book Review: The Secret Of Our Success
Specializing in Problems We Don't Understand
Why did everything take so long?
[Answer] Why wasn't science invented in China?
Scott Alexander
Mental Mountains
My attempt to explain Looking, insight meditation, and enlightenment in non-mysterious terms
Evolution of Modularity
Science in a High-Dimensional World
How uniform is the neocortex?
Building up to an Internal Family Systems model
Steven Byrnes
My computational framework for the brain
Counter-theses on Sleep
What makes people intellectually active?
Birth order effect found in Nobel Laureates in Physics
Elephant seal 2
Anti-Aging: State of the Art
Steelmanning Divination
Book summary: Unlocking the Emotional Brain

AI Strategy

Ajeya Cotra
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Daniel Kokotajlo
Cortés, Pizarro, and Afonso as Precedents for Takeover
Daniel Kokotajlo
The date of AI Takeover is not the day the AI takes over
What failure looks like
Daniel Kokotajlo
What 2026 looks like
It Looks Like You're Trying To Take Over The World
What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)
Another (outer) alignment failure story
Ajeya Cotra
Draft report on AI timelines
Eliezer Yudkowsky
Biology-Inspired AGI Timelines: The Trick That Never Works
Reply to Eliezer on Biological Anchors
AGI safety from first principles: Introduction
Daniel Kokotajlo
Fun with +12 OOMs of Compute
Wei Dai
AI Safety "Success Stories"
Counterarguments to the basic AI x-risk case
The Plan
Rohin Shah
Reframing Superintelligence: Comprehensive AI Services as General Intelligence
What an actually pessimistic containment strategy looks like
Eliezer Yudkowsky
MIRI announces new "Death With Dignity" strategy
Chris Olah’s views on AGI safety
Comments on Carlsmith's “Is power-seeking AI an existential risk?”
Adam Scholl
The Parable of Predict-O-Matic
Let’s think about slowing down AI
human psycholinguists: a critical appraisal
larger language models may disappoint you [or, an eternally unfinished draft]
Daniel Kokotajlo
Against GDP as a metric for timelines and takeoff speeds
Arguments about fast takeoff
Eliezer Yudkowsky
Six Dimensions of Operational Adequacy in AGI Projects

Technical AI Safety

Some AI research areas and their relevance to existential safety
EfficientZero: How It Works
Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment
Decision theory does not imply that we get to have nice things
Reward is not the optimization target
Worlds Where Iterative Design Fails
Specification gaming examples in AI
Rafael Harth
Inner Alignment: Explain like I'm 12 Edition
An overview of 11 proposals for building safe advanced AI
Alignment By Default
How To Go From Interpretability To Alignment: Just Retarget The Search
Alex Flint
Search versus design
Selection vs Control
Mark Xu
The Solomonoff Prior is Malign
My research methodology
Eliezer Yudkowsky
The Rocket Alignment Problem
Eliezer Yudkowsky
AGI Ruin: A List of Lethalities
A central AI alignment problem: capabilities generalization, and the sharp left turn
Reframing Impact
Scott Garrabrant
Robustness to Scale
Inaccessible information
Seeking Power is Often Convergently Instrumental in MDPs
On how various plans miss the hard bits of the alignment challenge
Alignment Research Field Guide
The strategy-stealing assumption
Optimality is the tiger, and agents are its teeth
Sam Ringer
Models Don't "Get Reward"
The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables
Language models seem to be much better than humans at next-token prediction
An Untrollable Mathematician Illustrated
An Orthodox Case Against Utility Functions
Selection Theorems: A Program For Understanding Agents
Rohin Shah
Coherence arguments do not entail goal-directed behavior
Alex Flint
The ground of optimization
Where I agree and disagree with Eliezer
Eliezer Yudkowsky
Ngo and Yudkowsky on alignment difficulty
Embedded Agents
Risks from Learned Optimization: Introduction
chinchilla's wild implications
Why Agent Foundations? An Overly Abstract Explanation
Paul's research agenda FAQ
Eliezer Yudkowsky
Coherent decisions imply consistent utilities
Open question: are minimal circuits daemon-free?
Gradient hacking
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
Humans provide an untapped wealth of evidence about alignment
Neel Nanda
A Mechanistic Interpretability Analysis of Grokking
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Understanding “Deep Double Descent”
Quintin Pope
The shard theory of human values
Inner and outer alignment decompose one hard problem into two extremely hard problems
Eliezer Yudkowsky
Challenges to Christiano’s capability amplification proposal
Scott Garrabrant
Finite Factored Sets
ARC's first technical report: Eliciting Latent Knowledge
Introduction To The Infra-Bayesianism Sequence
Towards a New Impact Measure

Eliezer describes the similarity between understanding what a locally valid proof step is in mathematics, knowing there are bad arguments for true conclusions, and that for civilization to hold together, people need to apply rules impartially even if it feels like it costs them in a particular instance. He fears that our society is losing appreciation for these points.

23Ben Pace
I think about this post a lot, and sometimes in conjunction with my own post on common knowlege. As well as it being a referent for when I think about fairness, it also ties in with how I think about LessWrong, Arbital and communal online endeavours for truth. The key line is: You can think of Wikipedia as being a set of communally editable web pages where the content of the page is constrained to be that which we can easily gain common knowledge of its truth. Wikipedia's information is only that which comes from verifiable sources, which is how they solve this problem - all the editors don't have to get in a room and talk forever if there's a simple standard of truth. (I mean, they still do, but it would blow up to an impossible level if the standard were laxer than this.) I understand a key part of the vision for Arbital was that, instead of the common standard being verifiable facts, it was instead to build a site around verifiable steps of inference, or alternatively phrased, local validity. This would allow us to walk through argument space together without knowing whether the conclusions were true or false yet. I think about this a lot, in terms of what steps a community can make together. I maybe will write a post on it more some day. I'm really grateful that Eliezer wrote this post.
It strikes me as pedagogically unfortunate that sections i. and ii. (on arguments and proof-steps being locally valid) are part of the same essay as as sections iii.–vi. (on what this has to do with the function of Law in Society). Had this been written in the Sequences-era, one would imagine this being (at least) two separate posts, and it would be nice to have a reference link for just the concept of argumentative local validity (which is obviously correct and important to have a name for, even if some of the speculations about Law in sections iii.–vi. turned out to be wrong).

A coordination problem is when everyone is taking some action A, and we’d rather all be taking action B, but it’s bad if we don’t all move to B at the same time. Common knowledge is the name for the epistemic state we’re collectively in, when we know we can all start choosing action B - and trust everyone else to do the same.


How do human beings produce knowledge? When we describe rational thought processes, we tend to think of them as essentially deterministic, deliberate, and algorithmic. After some self-examination, however, Alkjash came to think that his process is closer to babbling many random strings and later filtering by a heuristic.

I just re-read this sequence. Babble has definitely made its way into my core vocabulary. I think of "improving both the Babble and Prune of LessWrong" as being central to my current goals, and I think this post was counterfactually relevant for that. Originally I had planned to vote weakly in favor of this post, but am currently positioning it more at the upper-mid-range of my votes. I think it's somewhat unfortunate that the Review focused only on posts, as opposed to sequences as a whole. I just re-read this sequence, and I think the posts More Babble, Prune, and Circumambulation have more substance/insight/gears/hooks than this one. (I didn't get as much out of Write). But, this one was sort of "the schelling post to nominate" if you were going to nominate one of them. The piece as a whole succeeds very much as both Art as well as pedagogy.

In this post, Alkjash explores the concept of Babble and Prune as a model for thought generation. Babble refers to generating many possibilities with a weak heuristic, while Prune involves using a stronger heuristic to filter and select the best options. He discusses how this model relates to creativity, problem-solving, and various aspects of human cognition and culture. 


Babble is our ability to generate ideas. Prune is our ability to filter those ideas. For many people, Prune is too strong, so they don't generate enough ideas. This post explores how to relax Prune to let more ideas through.


Eliezer explores a dichotomy between "thinking in toolboxes" and "thinking in laws". 
Toolbox thinkers are oriented around a "big bag of tools that you adapt to your circumstances." Law thinkers are oriented around universal laws, which might or might not be useful tools, but which help us model the world and scope out problem-spaces. There seems to be confusion when toolbox and law thinkers talk to each other.


Often you can compare your own Fermi estimates with those of other people, and that’s sort of cool, but what’s way more interesting is when they share what variables and models they used to get to the estimate. This lets you actually update your model in a deeper way.

11Jameson Quinn
This is the second time I've seen this. Now it seems obvious. I remember liking it the first time, but also remember it being obvious. That second part of the memory is probably false. I think it's likely that this explained the idea so well that I now think it's obvious. In other words: very well done.

Scott Alexander reviews and expands on Paul Graham's "hierarchy of disagreement" to create a broader and more detailed taxonomy of argument types, from the most productive to the least. He discusses the difficulty and importance of avoiding lower levels of argument, and the value of seeking "high-level generators of disagreement" even when they don't lead to agreement. 

11Scott Alexander
I still generally endorse this post, though I agree with everyone else's caveats that many arguments aren't like this. The biggest change is that I feel like I have a slightly better understanding of "high-level generators of disagreement" now, as differences in priors, contexts, and categorizations - see my post "Mental Mountains" for more.

There are problems with the obvious-seeming "wizard's code of honesty" aka "never say things that are false". Sometimes, even exceptionally honest people lie (such as when hiding fugitives from an unjust regime). If "never lie" is unworkable as an absolute rule, what code of conduct should highly honest people aspire to? 

38Ben Pace
Here are my thoughts. 1. Being honest is hard, and there are many difficult and surprising edge-cases, including things like context failures, negotiating with powerful institutions, politicised narratives, and compute limitations. 2. On top of the rule of trying very hard to be honest, Eliezer's post offers an additional general rule for navigating the edge cases. The rule is that when you’re having a general conversation all about the sorts of situations you would and wouldn’t lie, you must be absolutely honest. You can explicitly not answer certain questions if it seems necessary, but you must never lie. 3. I think this rule is a good extension of the general principle of honesty, and appreciate Eliezer's theoretical arguments for why this rule is necessary. 4. Eliezer’s post introduces some new terminology for discussions of honesty - in particular, the term 'meta-honesty' as the rule instead of 'honesty'. 5. If the term 'meta-honesty' is common knowledge but the implementation details aren't, and if people try to use it, then they will perceive a large number of norm violations that are actually linguistic confusions. Linguistic confusions are not strongly negative in most fields, merely a nuisance, but in discussions of norm-violation (e.g. a court of law) they have grave consequences, and you shouldn't try to build communal norms on such shaky foundations. 6. I and many other people this post was directed at, find it requires multiple readings to understand, so I think that if everyone reads this post, it will not be remotely sufficient for making the implementation details common knowledge, even if the term can become that. 7. In general, I think that everyone should make sure it is acceptable, when asking "Can we operate under the norms of meta-honesty?" for the other person to reply "I'd like to taboo the term 'meta-honesty', because I'm not sure we'll be talking about the same thing if we use that term." 8. This is a valuable bedrock for thinking
Reply: "Firming Up Not-Lying Around Its Edge-Cases Is Less Broadly Useful Than One Might Initially Think"

Some people claim that aesthetics don't mean anything, and are resistant to the idea that they could.  After all, aesthetic preferences are very individual. 

Sarah argues that the skeptics have a point, but they're too epistemically conservative. Colors don't have intrinsic meanings, but they do have shared connotations within a culture. There's obviously some signal being carried through aesthetic choices.

This post kills me. Lots of great stuff, and I think this strongly makes the cut. Sarah has great insights into what is going on, then turns away from them right when following through would be most valuable. The post is explaining why she and an entire culture is being defrauded by aesthetics. That is it used to justify all sorts of things, including high prices and what is cool, based on things that have no underlying value. How it contains lots of hostile subliminal messages that are driving her crazy. It's very clear. And then she... doesn't see the fnords. So close!

By default, humans are a kludgy bundle of impulses. But we have the ability to reflect upon our decision making, and the implications thereof, and derive better overall policies. You might want to become a more robust, coherent agent – in particular if you're operating in an unfamiliar domain, where common wisdom can't guide you.

Author here. I still endorse the post and have continued to find it pretty central to how I think about myself and nearby ecosystems. I just submitted some major edits to the post. Changes include: 1. Name change ("Robust, Coherent Agent") After much hemming and hawing and arguing, I changed the name from "Being a Robust Agent" to "Being a Robust, Coherent Agent." I'm not sure if this was the right call. It was hard to pin down exactly one "quality" that the post was aiming at. Coherence was the single word that pointed towards "what sort of agent to become." But I think "robustness" still points most clearly towards why you'd want to change. I added some clarifying remarks about that. In individual sentences I tend to refer to either "Robust Agents" or "Coherent agents" depending on what that sentence was talking about Other options include "Reflective Agent" or "Deliberate Agent." (I think once you deliberate on what sort of agent you want to be, you often become more coherent and robust, although not necessarily) Edit" Undid the name change, seemed like it was just a worse title. 2. Spelling out what exactly the strategy entails Originally the post was vaguely gesturing at an idea. It seemed good to try to pin that idea down more clearly. This does mean that, by getting "more specific" it might also be more "wrong." I've run the new draft by a few people and I'm fairly happy with the new breakdown: * Deliberate Agency * Gears Level Understanding of Yourself * Coherence and Consistency * Game Theoretic Soundness But, if people think that's carving the concept at the wrong joints, let me know. 3. "Why is this important?" Zvi's review noted that the post didn't really argue why becoming a robust agent was so important.  Originally, I viewed the post as simply illustrating an idea rather than arguing for it, and... maybe that was fine. I think it would have been fine to "why" that for a followup post.  But I reflected a bit on why it seemed importan
As you would expect from someone who was one of the inspirations for the post, I strongly approve of the insight/advice contained herein. I also agree with the previous review that there is not a known better write-up of this concept. I like that this gets the thing out there compactly. Where I am disappointed is that this does not feel like it gets across the motivation behind this or why it is so important - I neither read this and think 'yes that explains why I care about this so much' or 'I expect that this would move the needle much on people's robustness as agents going forward if they read this.' So I guess the takeaway for me looking back is, good first attempt and I wouldn't mind including it in the final list, but someone needs to try again? It is worth noting that Jacob did *exactly* the adjustments that I would hope would result from this post if it worked as intended, so perhaps it is better than I give it credit for? Would be curious if anyone else had similar things to report.