This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Home
All Posts
Concepts
Library
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Cambridge ACX/SSC monthly meetup
Sat Apr 20
•
Cambridgeshire
Virtual AI Safety Unconference 2024
Thu May 23
•
Online
St. Louis ACX Meetups Everywhere Spring 2024
Sat Sep 9
•
St. Louis
Guided By The Beauty Of Our Weapons
Thu Mar 28
•
Waterloo
Dialogue Matchmaking
Subscribe (RSS/Email)
About
FAQ
All Posts
Sorted by New
Timeframe:
All time
Daily
Weekly
Monthly
Yearly
Sorted by:
Magic (New & Upvoted)
Top
Top (Inflation Adjusted)
Recent Comments
New
Old
Filtered by:
All Posts
Frontpage
Curated
Questions
Events
Show Low Karma
Show Events
174
"How could I have thought that faster?"
mesaoptimizer
2d
31
139
Using axis lines for good or evil
dynomight
10d
39
223
My Clients, The Liars
ymeskhout
20d
84
104
Social status part 1/2: negotiations over object-level preferences
Steven Byrnes
14d
15
57
Acting Wholesomely
owencb
17d
62
253
Scale Was All We Needed, At First
Gabriel Mukobi
6d
29
217
CFAR Takeaways: Andrew Critch
Raemon
1mo
62
138
And All the Shoggoths Merely Players
Zack_M_Davis
1mo
56
124
Updatelessness doesn't solve most problems
Ω
Martín Soto
1mo
Ω
43
203
Believing In
AnnaSalamon
2mo
49
96
Attitudes about Applied Rationality
Camille Berger
2mo
17
150
Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Ω
Jeremy Gillen
,
peterbarnett
2mo
Ω
59
235
The case for ensuring that powerful AIs are controlled
Ω
ryan_greenblatt
,
Buck
2mo
Ω
66
122
A Shutdown Problem Proposal
Ω
johnswentworth
,
David Lorell
2mo
Ω
60
340
There is way too much serendipity
Malmesbury
2mo
56
288
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Ω
evhub
,
Carson Denison
,
Meg
,
Monte M
,
David Duvenaud
,
Nicholas Schiefer
,
Ethan Perez
2mo
Ω
94
129
Deep atheism and AI risk
Joe Carlsmith
23d
22
264
Gentleness and the artificial Other
Joe Carlsmith
3mo
31
95
A case for AI alignment being difficult
Ω
jessicata
3mo
Ω
53
90
Meaning & Agency
Ω
abramdemski
3mo
Ω
17
259
Constellations are Younger than Continents
Jeffrey Heninger
3mo
22
131
The Dark Arts
lsusr
,
Lyrongolem
3mo
49
145
Discussion: Challenges with Unsupervised LLM Knowledge Discovery
Ω
Seb Farquhar
,
Vikrant Varma
,
zac_kenton
,
gasteigerjo
,
Vlad Mikulik
,
Rohin Shah
3mo
Ω
21
400
Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible
GeneSmith
,
kman
3mo
162
287
Speaking to Congressional staffers about AI risk
Akash
,
hath
1mo
23
148
How useful is mechanistic interpretability?
ryan_greenblatt
,
Neel Nanda
,
Buck
,
habryka
3mo
53
303
Shallow review of live agendas in alignment & safety
Ω
technicalities
,
Stag
4mo
Ω
69
137
Moral Reality Check (a short story)
jessicata
4mo
44
215
What are the results of more parental supervision and less outdoor play?
juliawise
4mo
30
280
Social Dark Matter
[DEACTIVATED] Duncan Sabien
4mo
112
245
AI Timelines
Ω
habryka
,
Daniel Kokotajlo
,
Ajeya Cotra
,
Ege Erdil
5mo
Ω
74
179
Thinking By The Clock
Screwtape
4mo
27
260
The 6D effect: When companies take risks, one email can be very powerful.
scasper
4mo
40
104
Deception Chess: Game #1
Zane
,
aphyer
,
Alex A
,
AdamYedidia
5mo
19
240
Book Review: Going Infinite
Zvi
5mo
109
238
Alignment Implications of LLM Successes: a Debate in One Act
Ω
Zack_M_Davis
5mo
Ω
50
157
Holly Elmore and Rob Miles dialogue on AI Safety Advocacy
jacobjacob
,
Robert Miles
,
Holly_Elmore
5mo
30
281
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Ω
Zac Hatfield-Dodds
5mo
Ω
18
169
Thomas Kwa's MIRI research experience
Thomas Kwa
,
peterbarnett
,
Vivek Hebbar
,
Jeremy Gillen
,
jacobjacob
,
Raemon
6mo
52
102
Cohabitive Games so Far
mako yass
6mo
116
324
Inside Views, Impostor Syndrome, and the Great LARP
johnswentworth
6mo
53
480
The Talk: a brief explanation of sexual dimorphism
Malmesbury
6mo
72
195
UDT shows that decision theory is more puzzling than ever
Ω
Wei Dai
6mo
Ω
51
222
Sum-threshold attacks
TsviBT
5mo
52
212
What I would do if I wasn’t at ARC Evals
Ω
LawrenceC
6mo
Ω
8
181
A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX
jacobjacob
6mo
23
121
Report on Frontier Model Training
YafahEdelman
7mo
21
249
Dear Self; we need to talk about ambition
Elizabeth
7mo
25
305
Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research
Ω
evhub
,
Nicholas Schiefer
,
Carson Denison
,
Ethan Perez
7mo
Ω
26
175
Feedbackloop-first Rationality
Raemon
7mo
65