LESSWRONG
Books of LessWrong
LW

A Moderate Update to your Artificial Priors

228ARC's first technical report: Eliciting Latent Knowledge

paulfchristiano, Mark Xu, Ajeya Cotra

3y

90

227Fun with +12 OOMs of Compute

Daniel Kokotajlo

4y

86

533What 2026 looks like

Daniel Kokotajlo

4y

160

254Ngo and Yudkowsky on alignment difficulty

Eliezer Yudkowsky, Richard_Ngo

3y

151

247Another (outer) alignment failure story

paulfchristiano

4y

38

282What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)

4y

65

3y

78

148Finite Factored Sets

Scott Garrabrant

4y

95

128Selection Theorems: A Program For Understanding Agents

3y

28

159My research methodology

paulfchristiano

4y

38

260larger language models may disappoint you [or, an eternally unfinished draft]

3y

31

139Comments on Carlsmith's “Is power-seeking AI an existential risk?”

3y

15

298EfficientZero: How It Works

3y

50

174Specializing in Problems We Don't Understand

4y

29