This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Home
All Posts
Concepts
Library
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Virtual AI Safety Unconference 2024
LessWrong Community Weekend 2024 [Applications Open]
[Today]
New Orleans – ACX Meetups Everywhere Spring 2024
[Today]
Halifax – ACX Meetups Everywhere Spring 2024
Subscribe (RSS/Email)
LW the Album
About
FAQ
All Posts
Sorted by New
Timeframe:
All time
Daily
Weekly
Monthly
Yearly
Exponential
Sorted by:
Magic (New & Upvoted)
Top
Top (Inflation Adjusted)
Recent Comments
New
Old
Filtered by:
All Posts
Frontpage
Curated
Questions
Events
Show Low Karma
Show Events
284
Thoughts on seed oil
dynomight
13d
106
357
Transformers Represent Belief State Geometry in their Residual Stream
Ω
Adam Shai
3d
Ω
80
121
My experience using financial commitments to overcome akrasia
William Howard
10d
31
304
The Best Tacit Knowledge Videos on Every Subject
Parker Conley
,
hans truman
20d
123
77
[Linkpost] Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate
trevor
16d
22
257
On green
Joe Carlsmith
1mo
34
248
My PhD thesis: Algorithmic Bayesian Epistemology
Eric Neyman
1mo
14
173
Toward a Broader Conception of Adverse Selection
Ricki Heicklen
25d
61
200
"How could I have thought that faster?"
mesaoptimizer
1mo
30
140
Using axis lines for good or evil
dynomight
2mo
39
231
My Clients, The Liars
ymeskhout
2mo
85
110
Social status part 1/2: negotiations over object-level preferences
Steven Byrnes
2mo
15
57
Acting Wholesomely
owencb
2mo
64
263
Scale Was All We Needed, At First
Gabe M
1mo
31
213
CFAR Takeaways: Andrew Critch
Raemon
2mo
62
139
And All the Shoggoths Merely Players
Zack_M_Davis
3mo
57
124
Updatelessness doesn't solve most problems
Ω
Martín Soto
2mo
Ω
43
211
Believing In
AnnaSalamon
3mo
49
108
Attitudes about Applied Rationality
Camille Berger
3mo
18
160
Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Ω
Jeremy Gillen
,
peterbarnett
3mo
Ω
60
245
The case for ensuring that powerful AIs are controlled
Ω
ryan_greenblatt
,
Buck
3mo
Ω
66
122
A Shutdown Problem Proposal
Ω
johnswentworth
,
David Lorell
3mo
Ω
61
350
There is way too much serendipity
Malmesbury
3mo
56
291
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Ω
evhub
,
Carson Denison
,
Meg
,
Monte M
,
David Duvenaud
,
Nicholas Schiefer
,
Ethan Perez
4mo
Ω
94
131
Deep atheism and AI risk
Joe Carlsmith
2mo
22
267
Gentleness and the artificial Other
Joe Carlsmith
4mo
33
96
A case for AI alignment being difficult
Ω
jessicata
4mo
Ω
53
90
Meaning & Agency
Ω
abramdemski
4mo
Ω
17
259
Constellations are Younger than Continents
Jeffrey Heninger
4mo
22
131
The Dark Arts
lsusr
,
Lyrongolem
4mo
49
147
Discussion: Challenges with Unsupervised LLM Knowledge Discovery
Ω
Seb Farquhar
,
Vikrant Varma
,
zac_kenton
,
gasteigerjo
,
Vlad Mikulik
,
Rohin Shah
4mo
Ω
21
411
Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible
GeneSmith
,
kman
5mo
162
289
Speaking to Congressional staffers about AI risk
Akash
,
hath
2mo
23
155
How useful is mechanistic interpretability?
ryan_greenblatt
,
Neel Nanda
,
Buck
,
habryka
4mo
53
309
Shallow review of live agendas in alignment & safety
Ω
technicalities
,
Stag
5mo
Ω
69
138
Moral Reality Check (a short story)
jessicata
5mo
44
215
What are the results of more parental supervision and less outdoor play?
juliawise
5mo
30
282
Social Dark Matter
[DEACTIVATED] Duncan Sabien
5mo
112
255
AI Timelines
Ω
habryka
,
Daniel Kokotajlo
,
Ajeya Cotra
,
Ege Erdil
6mo
Ω
74
185
Thinking By The Clock
Screwtape
6mo
27
261
The 6D effect: When companies take risks, one email can be very powerful.
scasper
5mo
40
104
Deception Chess: Game #1
Zane
,
aphyer
,
Alex A
,
AdamYedidia
6mo
19
240
Book Review: Going Infinite
Zvi
6mo
109
238
Alignment Implications of LLM Successes: a Debate in One Act
Ω
Zack_M_Davis
6mo
Ω
50
157
Holly Elmore and Rob Miles dialogue on AI Safety Advocacy
jacobjacob
,
Robert Miles
,
Holly_Elmore
6mo
30
286
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Ω
Zac Hatfield-Dodds
6mo
Ω
21
169
Thomas Kwa's MIRI research experience
Thomas Kwa
,
peterbarnett
,
Vivek Hebbar
,
Jeremy Gillen
,
jacobjacob
,
Raemon
7mo
52
102
Cohabitive Games so Far
mako yass
7mo
116
326
Inside Views, Impostor Syndrome, and the Great LARP
johnswentworth
7mo
53
481
The Talk: a brief explanation of sexual dimorphism
Malmesbury
7mo
72