This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Books of LessWrong
LW
Login
A Moderate Update to your Artificial Priors
228
ARC's first technical report: Eliciting Latent Knowledge
Ω
paulfchristiano
,
Mark Xu
,
Ajeya Cotra
3y
Ω
90
227
Fun with +12 OOMs of Compute
Ω
Daniel Kokotajlo
4y
Ω
86
533
What 2026 looks like
Ω
Daniel Kokotajlo
4y
Ω
160
254
Ngo and Yudkowsky on alignment difficulty
Ω
Eliezer Yudkowsky
,
Richard_Ngo
3y
Ω
151
247
Another (outer) alignment failure story
Ω
paulfchristiano
4y
Ω
38
282
What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)
Ω
Andrew_Critch
4y
Ω
65
260
The Plan
Ω
johnswentworth
3y
Ω
78
148
Finite Factored Sets
Ω
Scott Garrabrant
4y
Ω
95
128
Selection Theorems: A Program For Understanding Agents
Ω
johnswentworth
3y
Ω
28
159
My research methodology
Ω
paulfchristiano
4y
Ω
38
260
larger language models may disappoint you [or, an eternally unfinished draft]
Ω
nostalgebraist
3y
Ω
31
139
Comments on Carlsmith's “Is power-seeking AI an existential risk?”
Ω
So8res
3y
Ω
15
298
EfficientZero: How It Works
Ω
1a3orn
3y
Ω
50
174
Specializing in Problems We Don't Understand
johnswentworth
4y
29