The Best of LessWrong

When posts turn more than a year old, the LessWrong community reviews and votes on how well they have stood the test of time. These are the posts that have ranked the highest for all years since 2018 (when our annual tradition of choosing the least wrong of LessWrong began).

For the years 2018, 2019 and 2020 we also published physical books with the results of our annual vote, which you can buy and learn more about here.
+

Rationality

Eliezer Yudkowsky
Local Validity as a Key to Sanity and Civilization
Buck
"Other people are wrong" vs "I am right"
Mark Xu
Strong Evidence is Common
johnswentworth
You Are Not Measuring What You Think You Are Measuring
johnswentworth
Gears-Level Models are Capital Investments
Hazard
How to Ignore Your Emotions (while also thinking you're awesome at emotions)
Scott Garrabrant
Yes Requires the Possibility of No
Scott Alexander
Trapped Priors As A Basic Problem Of Rationality
Duncan Sabien (Deactivated)
Split and Commit
Ben Pace
A Sketch of Good Communication
Eliezer Yudkowsky
Meta-Honesty: Firming Up Honesty Around Its Edge-Cases
Duncan Sabien (Deactivated)
Lies, Damn Lies, and Fabricated Options
Duncan Sabien (Deactivated)
CFAR Participant Handbook now available to all
johnswentworth
What Are You Tracking In Your Head?
Mark Xu
The First Sample Gives the Most Information
Duncan Sabien (Deactivated)
Shoulder Advisors 101
Zack_M_Davis
Feature Selection
abramdemski
Mistakes with Conservation of Expected Evidence
Scott Alexander
Varieties Of Argumentative Experience
Eliezer Yudkowsky
Toolbox-thinking and Law-thinking
alkjash
Babble
Kaj_Sotala
The Felt Sense: What, Why and How
Duncan Sabien (Deactivated)
Cup-Stacking Skills (or, Reflexive Involuntary Mental Motions)
Ben Pace
The Costly Coordination Mechanism of Common Knowledge
Jacob Falkovich
Seeing the Smoke
Elizabeth
Epistemic Legibility
Daniel Kokotajlo
Taboo "Outside View"
alkjash
Prune
johnswentworth
Gears vs Behavior
Raemon
Noticing Frame Differences
Duncan Sabien (Deactivated)
Sazen
AnnaSalamon
Reality-Revealing and Reality-Masking Puzzles
Eliezer Yudkowsky
ProjectLawful.com: Eliezer's latest story, past 1M words
Eliezer Yudkowsky
Self-Integrity and the Drowning Child
Jacob Falkovich
The Treacherous Path to Rationality
Scott Garrabrant
Tyranny of the Epistemic Majority
alkjash
More Babble
abramdemski
Most Prisoner's Dilemmas are Stag Hunts; Most Stag Hunts are Schelling Problems
Raemon
Being a Robust Agent
Zack_M_Davis
Heads I Win, Tails?—Never Heard of Her; Or, Selective Reporting and the Tragedy of the Green Rationalists
Benquo
Reason isn't magic
habryka
Integrity and accountability are core parts of rationality
Raemon
The Schelling Choice is "Rabbit", not "Stag"
Diffractor
Threat-Resistant Bargaining Megapost: Introducing the ROSE Value
Raemon
Propagating Facts into Aesthetics
johnswentworth
Simulacrum 3 As Stag-Hunt Strategy
LoganStrohl
Catching the Spark
Jacob Falkovich
Is Rationalist Self-Improvement Real?
Benquo
Excerpts from a larger discussion about simulacra
Zvi
Simulacra Levels and their Interactions
abramdemski
Radical Probabilism
sarahconstantin
Naming the Nameless
AnnaSalamon
Comment reply: my low-quality thoughts on why CFAR didn't get farther with a "real/efficacious art of rationality"
Eric Raymond
Rationalism before the Sequences
Owain_Evans
The Rationalists of the 1950s (and before) also called themselves “Rationalists”
+

Optimization

sarahconstantin
The Pavlov Strategy
johnswentworth
Coordination as a Scarce Resource
AnnaSalamon
What should you change in response to an "emergency"? And AI risk
Zvi
Prediction Markets: When Do They Work?
johnswentworth
Being the (Pareto) Best in the World
alkjash
Is Success the Enemy of Freedom? (Full)
jasoncrawford
How factories were made safe
HoldenKarnofsky
All Possible Views About Humanity's Future Are Wild
jasoncrawford
Why has nuclear power been a flop?
Zvi
Simple Rules of Law
Elizabeth
Power Buys You Distance From The Crime
Eliezer Yudkowsky
Is Clickbait Destroying Our General Intelligence?
Scott Alexander
The Tails Coming Apart As Metaphor For Life
Zvi
Asymmetric Justice
Jeffrey Ladish
Nuclear war is unlikely to cause human extinction
Spiracular
Bioinfohazards
Zvi
Moloch Hasn’t Won
Zvi
Motive Ambiguity
Benquo
Can crimes be discussed literally?
Said Achmiz
The Real Rules Have No Exceptions
Lars Doucet
Lars Doucet's Georgism series on Astral Codex Ten
johnswentworth
When Money Is Abundant, Knowledge Is The Real Wealth
HoldenKarnofsky
This Can't Go On
Scott Alexander
Studies On Slack
johnswentworth
Working With Monsters
jasoncrawford
Why haven't we celebrated any major achievements lately?
abramdemski
The Credit Assignment Problem
Martin Sustrik
Inadequate Equilibria vs. Governance of the Commons
Raemon
The Amish, and Strategic Norms around Technology
Zvi
Blackmail
KatjaGrace
Discontinuous progress in history: an update
Scott Alexander
Rule Thinkers In, Not Out
Jameson Quinn
A voting theory primer for rationalists
HoldenKarnofsky
Nonprofit Boards are Weird
Wei Dai
Beyond Astronomical Waste
johnswentworth
Making Vaccine
jefftk
Make more land
+

World

Ben
The Redaction Machine
Samo Burja
On the Loss and Preservation of Knowledge
Alex_Altair
Introduction to abstract entropy
Martin Sustrik
Swiss Political System: More than You ever Wanted to Know (I.)
johnswentworth
Interfaces as a Scarce Resource
johnswentworth
Transportation as a Constraint
eukaryote
There’s no such thing as a tree (phylogenetically)
Scott Alexander
Is Science Slowing Down?
Martin Sustrik
Anti-social Punishment
Martin Sustrik
Research: Rescuers during the Holocaust
GeneSmith
Toni Kurz and the Insanity of Climbing Mountains
johnswentworth
Book Review: Design Principles of Biological Circuits
Elizabeth
Literature Review: Distributed Teams
Valentine
The Intelligent Social Web
Bird Concept
Unconscious Economics
eukaryote
Spaghetti Towers
Eli Tyre
Historical mathematicians exhibit a birth order effect too
johnswentworth
What Money Cannot Buy
Scott Alexander
Book Review: The Secret Of Our Success
johnswentworth
Specializing in Problems We Don't Understand
KatjaGrace
Why did everything take so long?
Ruby
[Answer] Why wasn't science invented in China?
Scott Alexander
Mental Mountains
Kaj_Sotala
My attempt to explain Looking, insight meditation, and enlightenment in non-mysterious terms
johnswentworth
Evolution of Modularity
johnswentworth
Science in a High-Dimensional World
zhukeepa
How uniform is the neocortex?
Kaj_Sotala
Building up to an Internal Family Systems model
Steven Byrnes
My computational framework for the brain
Natália
Counter-theses on Sleep
abramdemski
What makes people intellectually active?
Bucky
Birth order effect found in Nobel Laureates in Physics
KatjaGrace
Elephant seal 2
JackH
Anti-Aging: State of the Art
Vaniver
Steelmanning Divination
Kaj_Sotala
Book summary: Unlocking the Emotional Brain
+

AI Strategy

Ajeya Cotra
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Daniel Kokotajlo
Cortés, Pizarro, and Afonso as Precedents for Takeover
Daniel Kokotajlo
The date of AI Takeover is not the day the AI takes over
paulfchristiano
What failure looks like
Daniel Kokotajlo
What 2026 looks like
gwern
It Looks Like You're Trying To Take Over The World
Andrew_Critch
What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)
paulfchristiano
Another (outer) alignment failure story
Ajeya Cotra
Draft report on AI timelines
Eliezer Yudkowsky
Biology-Inspired AGI Timelines: The Trick That Never Works
HoldenKarnofsky
Reply to Eliezer on Biological Anchors
Richard_Ngo
AGI safety from first principles: Introduction
Daniel Kokotajlo
Fun with +12 OOMs of Compute
Wei Dai
AI Safety "Success Stories"
KatjaGrace
Counterarguments to the basic AI x-risk case
johnswentworth
The Plan
Rohin Shah
Reframing Superintelligence: Comprehensive AI Services as General Intelligence
lc
What an actually pessimistic containment strategy looks like
Eliezer Yudkowsky
MIRI announces new "Death With Dignity" strategy
evhub
Chris Olah’s views on AGI safety
So8res
Comments on Carlsmith's “Is power-seeking AI an existential risk?”
Adam Scholl
Safetywashing
abramdemski
The Parable of Predict-O-Matic
KatjaGrace
Let’s think about slowing down AI
nostalgebraist
human psycholinguists: a critical appraisal
nostalgebraist
larger language models may disappoint you [or, an eternally unfinished draft]
Daniel Kokotajlo
Against GDP as a metric for timelines and takeoff speeds
paulfchristiano
Arguments about fast takeoff
Eliezer Yudkowsky
Six Dimensions of Operational Adequacy in AGI Projects
+

Technical AI Safety

Andrew_Critch
Some AI research areas and their relevance to existential safety
1a3orn
EfficientZero: How It Works
elspood
Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment
So8res
Decision theory does not imply that we get to have nice things
TurnTrout
Reward is not the optimization target
johnswentworth
Worlds Where Iterative Design Fails
Vika
Specification gaming examples in AI
Rafael Harth
Inner Alignment: Explain like I'm 12 Edition
evhub
An overview of 11 proposals for building safe advanced AI
johnswentworth
Alignment By Default
johnswentworth
How To Go From Interpretability To Alignment: Just Retarget The Search
Alex Flint
Search versus design
abramdemski
Selection vs Control
Mark Xu
The Solomonoff Prior is Malign
paulfchristiano
My research methodology
Eliezer Yudkowsky
The Rocket Alignment Problem
Eliezer Yudkowsky
AGI Ruin: A List of Lethalities
So8res
A central AI alignment problem: capabilities generalization, and the sharp left turn
TurnTrout
Reframing Impact
Scott Garrabrant
Robustness to Scale
paulfchristiano
Inaccessible information
TurnTrout
Seeking Power is Often Convergently Instrumental in MDPs
So8res
On how various plans miss the hard bits of the alignment challenge
abramdemski
Alignment Research Field Guide
paulfchristiano
The strategy-stealing assumption
Veedrac
Optimality is the tiger, and agents are its teeth
Sam Ringer
Models Don't "Get Reward"
johnswentworth
The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables
Buck
Language models seem to be much better than humans at next-token prediction
abramdemski
An Untrollable Mathematician Illustrated
abramdemski
An Orthodox Case Against Utility Functions
johnswentworth
Selection Theorems: A Program For Understanding Agents
Rohin Shah
Coherence arguments do not entail goal-directed behavior
Alex Flint
The ground of optimization
paulfchristiano
Where I agree and disagree with Eliezer
Eliezer Yudkowsky
Ngo and Yudkowsky on alignment difficulty
abramdemski
Embedded Agents
evhub
Risks from Learned Optimization: Introduction
nostalgebraist
chinchilla's wild implications
johnswentworth
Why Agent Foundations? An Overly Abstract Explanation
zhukeepa
Paul's research agenda FAQ
Eliezer Yudkowsky
Coherent decisions imply consistent utilities
paulfchristiano
Open question: are minimal circuits daemon-free?
evhub
Gradient hacking
janus
Simulators
LawrenceC
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
TurnTrout
Humans provide an untapped wealth of evidence about alignment
Neel Nanda
A Mechanistic Interpretability Analysis of Grokking
Collin
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
evhub
Understanding “Deep Double Descent”
Quintin Pope
The shard theory of human values
TurnTrout
Inner and outer alignment decompose one hard problem into two extremely hard problems
Eliezer Yudkowsky
Challenges to Christiano’s capability amplification proposal
Scott Garrabrant
Finite Factored Sets
paulfchristiano
ARC's first technical report: Eliciting Latent Knowledge
Diffractor
Introduction To The Infra-Bayesianism Sequence
TurnTrout
Towards a New Impact Measure
#1

How does it work to optimize for realistic goals in physical environments of which you yourself are a part? E.g. humans and robots in the real world, and not humans and AIs playing video games in virtual worlds where the player not part of the environment.  The authors claim we don't actually have a good theoretical understanding of this and explore four specific ways that we don't understand this process.

14orthonormal
Insofar as the AI Alignment Forum is part of the Best-of-2018 Review, this post deserves to be included. It's the friendliest explanation to MIRI's research agenda (as of 2018) that currently exists.
#2

This post is a not a so secret analogy for the AI Alignment problem. Via a fictional dialog, Eliezer explores and counters common questions to the Rocket Alignment Problem as approached by the Mathematics of Intentional Rocketry Institute. 

MIRI researchers will tell you they're worried that "right now, nobody can tell you how to point your rocket’s nose such that it goes to the moon, nor indeed any prespecified celestial destination."

26Isnasene
[Disclaimer: I'm reading this post for the first time now, as of 1/11/2020. I also already have a broad understanding of the importance of AI safety. While I am skeptical about MIRI's approach to things, I am also a fan of MIRI. Where this puts me relative to the target demographic of this post, I cannot say.] Overall Summary I think this post is pretty good. It's a solid and well-written introduction to some of the intuitions behind AI alignment and the fundamental research that MIRI does. At the same time, the use of analogy made the post more difficult for me to parse and hid some important considerations about AI alignment from view. Though it may be good (but not optimal) for introducing some people to the problem of AI alignment and a subset of MIRI's work, it did not raise or lower my opinion of MIRI as someone who already understood AGI safety to be important. To be clear, I do not consider any of these weaknesses serious because I believe them to be partially irrelevant to the audience of people who don't appreciate the importance of AI-Safety. Still, they are relevant to the audience of people who give AI-Safety the appropriate scrutiny but remain skeptical of MIRI. And I think this latter audience is important enough to assign this article a "pretty good" instead of a "great". I hope a future post directly explores the merit of MIRI's work on the context AI alignment without use of analogy. Below is an overview of my likes and dislikes in this post. I will go into more detail about them in the next section, "Evaluating Analogies." Things I liked: * It's a solid introduction to AI-alignment, covering a broad range of topics including: * Why we shouldn't expect aligned AGI by default * How modern conversation about AGI behavior is problematically underspecified * Why fundamental deconfusion research is necessary for solving AI-alignment * It directly explains the value/motivation of particular pieces of MIRI work via analogy -- which
#14

Rohin Shah argues that many common arguments for AI risk (about the perils of powerful expected utility maximizers) are actually arguments about goal-directed behavior or explicit reward maximization, which are not actually implied by coherence arguments. An AI system could be an expected utility maximizer without being goal-directed or an explicit reward maximizer.

49DanielFilan
I think that strictly speaking this post (or at least the main thrust) is true, and proven in the first section. The title is arguably less true: I think of 'coherence arguments' as including things like 'it's not possible for you to agree to give me a limitless number of dollars in return for nothing', which does imply some degree of 'goal-direction'. I think the post is important, because it constrains the types of valid arguments that can be given for 'freaking out about goal-directedness', for lack of a better term. In my mind, it provokes various follow-up questions: 1. What arguments would imply 'goal-directed' behaviour? 2. With what probability will a random utility maximiser be 'goal-directed'? 3. How often should I think of a system as a utility maximiser in resources, perhaps with a slowly-changing utility function? 4. How 'goal-directed' are humans likely to make systems, given that we are making them in order to accomplish certain tasks that don't look like random utility functions? 5. Is there some kind of 'basin of goal-directedness' that systems fall in if they're even a little goal-directed, causing them to behave poorly? Off the top of my head, I'm not familiar with compelling responses from the 'freak out about goal-directedness' camp on points 1 through 5, even though as a member of that camp I think that such responses exist. Responses from outside this camp include Rohin's post 'Will humans build goal-directed agents?'. Another response is Brangus' comment post, although I find its theory of goal-directedness uncompelling. I think that it's notable that Brangus' post was released soon after this was announced as a contender for Best of LW 2018. I think that if this post were added to the Best of LW 2018 Collection, the 'freak out' camp might produce more of these responses and move the dialogue forward. As such, I think it should be added, both because of the clear argumentation and because of the response it is likely to provoke.
36Vanessa Kosoy
In this essay, Rohin sets out to debunk what ey perceive as a prevalent but erroneous idea in the AI alignment community, namely: "VNM and similar theorems imply goal-directed behavior". This is placed in the context of Rohin's thesis that solving AI alignment is best achieved by designing AI which is not goal-directed. The main argument is: "coherence arguments" imply expected utility maximization, but expected utility maximization does not imply goal-directed behavior. Instead, it is a vacuous constraint, since any agent policy can be regarded as maximizing the expectation of some utility function. I have mixed feelings about this essay. On the one hand, the core argument that VNM and similar theorems do not imply goal-directed behavior is true. To the extent that some people believed the opposite, correcting this mistake is important. On the other hand, (i) I don't think the claim Rohin is debunking is the claim Eliezer had in mind in those sources Rohin cites (ii) I don't think that the conclusions Rohin draws or at least implies are the right conclusions. The actual claim that Eliezer was making (or at least my interpretation of it) is, coherence arguments imply that if we assume an agent is goal-directed then it must be an expected utility maximizer, and therefore EU maximization is the correct mathematical model to apply to such agents. Why do we care about goal-directed agents in the first place? The reason is, on the one hand goal-directed agents are the main source of AI risk, and on the other hand, goal-directed agents are also the most straightforward approach to solving AI risk. Indeed, if we could design powerful agents with the goals we want, these agents would protect us from unaligned AIs and solve all other problems as well (or at least solve them better than we can solve them ourselves). Conversely, if we want to protect ourselves from unaligned AIs, we need to generate very sophisticated long-term plans of action in the physical world, possibl
#16

You want your proposal for an AI to be robust to changes in its level of capabilities. It should be robust to the AI's capabilities scaling up, and also scaling down, and also the subcomponents of the AI scaling relative to each other. 

We might need to build AGIs that aren't robust to scale, but if so we should at least realize that we are doing that.

#20

Alex Zhu spent quite awhile understanding Paul's Iterated Amplication and Distillation agenda. He's written an in-depth FAQ, covering key concepts like amplification, distillation, corrigibility, and how the approach aims to create safe and capable AI assistants.

#21

A hand-drawn presentation on the idea of an 'Untrollable Mathematician' - a mathematical agent that can't be manipulated into believing false things. 

#23

A collection of examples of AI systems "gaming" their specifications - finding ways to achieve their stated objectives that don't actually solve the intended problem. These illustrate the challenge of properly specifying goals for AI systems.

29Vika
I've been pleasantly surprised by how much this resource has caught on in terms of people using it and referring to it (definitely more than I expected when I made it). There were 30 examples on the list when was posted in April 2018, and 20 new examples have been contributed through the form since then. I think the list has several properties that contributed to wide adoption: it's fun, standardized, up-to-date, comprehensive, and collaborative. Some of the appeal is that it's fun to read about AI cheating at tasks in unexpected ways (I've seen a lot of people post on Twitter about their favorite examples from the list). The standardized spreadsheet format seems easier to refer to as well. I think the crowdsourcing aspect is also helpful - this helps keep it current and comprehensive, and people can feel some ownership of the list since can personally contribute to it. My overall takeaway from this is that safety outreach tools are more likely to be impactful if they are fun and easy for people to engage with. This list had a surprising amount of impact relative to how little work it took me to put it together and maintain it. The hard work of finding and summarizing the examples was done by the people putting together the lists that the master list draws on (Gwern, Lehman, Olsson, Irpan, and others), as well as the people who submit examples through the form. What I do is put them together in a common format and clarify and/or shorten some of the summaries. I also curate the examples to determine whether they fit the definition of specification gaming (as opposed to simply a surprising behavior or solution). Overall, I've probably spent around 10 hours so far on creating and maintaining the list, which is not very much. This makes me wonder if there is other low hanging fruit in the safety resources space that we haven't picked yet. I have been using it both as an outreach and research tool. On the outreach side, the resource has been helpful for making the arg
#34

Can the smallest boolean circuit that solves a problem be a "daemon" (a consequentialist system with its own goals)? Paul Christiano suspects not, but isn't sure. He thinks this question, while not necessarily directly important, may yield useful insights for AI alignment.

#39

Eliezer Yudkowsky offers detailed critiques of Paul Christiano's AI alignment proposal, arguing that it faces major technical challenges and may not work without already having an aligned superintelligence. Christiano acknowledges the difficulties but believes they are solvable.