This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Home
All Posts
Concepts
Library
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Virtually Rational - VRChat Meetup
Sun Mar 24
•
Online
Cambridge ACX/SSC monthly meetup
Sat Apr 20
•
Cambridgeshire
Norfolk Social - March 19th
Tue Mar 19
•
Norfolk
OxRat March Pub Social
Wed Mar 20
•
Oxfordshire
Dialogue Matchmaking
Subscribe (RSS/Email)
About
FAQ
All Posts
Sorted by New
Timeframe:
All time
Daily
Weekly
Monthly
Yearly
Sorted by:
Magic (New & Upvoted)
Top
Top (Inflation Adjusted)
Recent Comments
New
Old
Filtered by:
All Posts
Frontpage
Curated
Questions
Events
Show Low Karma
Show Events
114
Using axis lines for good or evil
dynomight
1h
19
221
My Clients, The Liars
ymeskhout
10d
77
89
Social status part 1/2: negotiations over object-level preferences
Steven Byrnes
4d
11
57
Acting Wholesomely
owencb
7d
59
217
CFAR Takeaways: Andrew Critch
Raemon
24d
62
138
And All the Shoggoths Merely Players
Zack_M_Davis
1mo
56
124
Updatelessness doesn't solve most problems
Ω
Martín Soto
1mo
Ω
43
202
Believing In
AnnaSalamon
1mo
49
96
Attitudes about Applied Rationality
Camille Berger
1mo
17
150
Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Ω
Jeremy Gillen
,
peterbarnett
2mo
Ω
59
235
The case for ensuring that powerful AIs are controlled
Ω
ryan_greenblatt
,
Buck
2mo
Ω
66
122
A Shutdown Problem Proposal
Ω
johnswentworth
,
David Lorell
2mo
Ω
60
339
There is way too much serendipity
Malmesbury
2mo
56
287
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Ω
evhub
,
Carson Denison
,
Meg
,
Monte M
,
David Duvenaud
,
Nicholas Schiefer
,
Ethan Perez
2mo
Ω
94
124
Deep atheism and AI risk
Joe Carlsmith
13d
22
263
Gentleness and the artificial Other
Joe Carlsmith
2mo
31
95
A case for AI alignment being difficult
Ω
jessicata
3mo
Ω
51
90
Meaning & Agency
Ω
abramdemski
2mo
Ω
17
259
Constellations are Younger than Continents
Jeffrey Heninger
3mo
22
131
The Dark Arts
lsusr
,
Lyrongolem
3mo
49
145
Discussion: Challenges with Unsupervised LLM Knowledge Discovery
Ω
Seb Farquhar
,
Vikrant Varma
,
zac_kenton
,
gasteigerjo
,
Vlad Mikulik
,
Rohin Shah
3mo
Ω
21
398
Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible
GeneSmith
,
kman
3mo
162
285
Speaking to Congressional staffers about AI risk
Akash
,
hath
19d
23
148
How useful is mechanistic interpretability?
ryan_greenblatt
,
Neel Nanda
,
Buck
,
habryka
2mo
53
302
Shallow review of live agendas in alignment & safety
Ω
technicalities
,
Stag
4mo
Ω
69
137
Moral Reality Check (a short story)
jessicata
3mo
44
215
What are the results of more parental supervision and less outdoor play?
juliawise
3mo
30
280
Social Dark Matter
[DEACTIVATED] Duncan Sabien
4mo
110
241
AI Timelines
Ω
habryka
,
Daniel Kokotajlo
,
Ajeya Cotra
,
Ege Erdil
4mo
Ω
73
179
Thinking By The Clock
Screwtape
4mo
27
260
The 6D effect: When companies take risks, one email can be very powerful.
scasper
4mo
40
104
Deception Chess: Game #1
Zane
,
aphyer
,
Alex A
,
AdamYedidia
4mo
19
240
Book Review: Going Infinite
Zvi
5mo
109
238
Alignment Implications of LLM Successes: a Debate in One Act
Ω
Zack_M_Davis
5mo
Ω
47
157
Holly Elmore and Rob Miles dialogue on AI Safety Advocacy
jacobjacob
,
Robert Miles
,
Holly_Elmore
5mo
30
281
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Ω
Zac Hatfield-Dodds
5mo
Ω
18
169
Thomas Kwa's MIRI research experience
Thomas Kwa
,
peterbarnett
,
Vivek Hebbar
,
Jeremy Gillen
,
jacobjacob
,
Raemon
5mo
52
102
Cohabitive Games so Far
mako yass
5mo
110
322
Inside Views, Impostor Syndrome, and the Great LARP
johnswentworth
5mo
53
476
The Talk: a brief explanation of sexual dimorphism
Malmesbury
6mo
72
195
UDT shows that decision theory is more puzzling than ever
Ω
Wei Dai
6mo
Ω
51
222
Sum-threshold attacks
TsviBT
5mo
52
212
What I would do if I wasn’t at ARC Evals
Ω
LawrenceC
6mo
Ω
8
181
A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX
jacobjacob
6mo
23
121
Report on Frontier Model Training
YafahEdelman
6mo
21
249
Dear Self; we need to talk about ambition
Elizabeth
7mo
25
305
Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research
Ω
evhub
,
Nicholas Schiefer
,
Carson Denison
,
Ethan Perez
7mo
Ω
26
175
Feedbackloop-first Rationality
Raemon
7mo
65
205
My current LK99 questions
Eliezer Yudkowsky
7mo
38
191
Thoughts on sharing information about language model capabilities
Ω
paulfchristiano
7mo
Ω
34