This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Home
All Posts
Concepts
Library
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Virtual AI Safety Unconference 2024
Thu May 23
•
Online
2024 ACX Spring Megameetup
Thu May 9
•
Kitchener
Freiburg - Lightning Discussions
Fri May 10
•
Freiburg im Breisgau
Subscribe (RSS/Email)
LW the Album
About
FAQ
All Posts
Sorted by New
Timeframe:
All time
Daily
Weekly
Monthly
Yearly
Exponential
Sorted by:
Magic (New & Upvoted)
Top
Top (Inflation Adjusted)
Recent Comments
New
Old
Filtered by:
All Posts
Frontpage
Curated
Questions
Events
Show Low Karma
Show Events
145
On Not Pulling The Ladder Up Behind You
Screwtape
1d
13
296
Thoughts on seed oil
dynomight
16d
108
365
Transformers Represent Belief State Geometry in their Residual Stream
Ω
Adam Shai
6d
Ω
82
122
My experience using financial commitments to overcome akrasia
William Howard
13d
31
311
The Best Tacit Knowledge Videos on Every Subject
Parker Conley
,
Parker Conley
23d
128
77
[Linkpost] Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate
trevor
19d
22
258
On green
Joe Carlsmith
1mo
34
248
My PhD thesis: Algorithmic Bayesian Epistemology
Eric Neyman
1mo
14
174
Toward a Broader Conception of Adverse Selection
Ricki Heicklen
1mo
61
200
"How could I have thought that faster?"
mesaoptimizer
1mo
30
140
Using axis lines for good or evil
dynomight
2mo
39
231
My Clients, The Liars
ymeskhout
2mo
85
110
Social status part 1/2: negotiations over object-level preferences
Steven Byrnes
2mo
15
57
Acting Wholesomely
owencb
2mo
64
263
Scale Was All We Needed, At First
Gabe M
1mo
31
213
CFAR Takeaways: Andrew Critch
Raemon
2mo
62
139
And All the Shoggoths Merely Players
Zack_M_Davis
3mo
57
124
Updatelessness doesn't solve most problems
Ω
Martín Soto
3mo
Ω
43
212
Believing In
AnnaSalamon
3mo
49
108
Attitudes about Applied Rationality
Camille Berger
3mo
18
160
Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Ω
Jeremy Gillen
,
peterbarnett
3mo
Ω
60
245
The case for ensuring that powerful AIs are controlled
Ω
ryan_greenblatt
,
Buck
3mo
Ω
66
122
A Shutdown Problem Proposal
Ω
johnswentworth
,
David Lorell
3mo
Ω
61
350
There is way too much serendipity
Malmesbury
4mo
56
291
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Ω
evhub
,
Carson Denison
,
Meg
,
Monte M
,
David Duvenaud
,
Nicholas Schiefer
,
Ethan Perez
4mo
Ω
94
131
Deep atheism and AI risk
Joe Carlsmith
2mo
22
267
Gentleness and the artificial Other
Joe Carlsmith
4mo
33
96
A case for AI alignment being difficult
Ω
jessicata
4mo
Ω
53
90
Meaning & Agency
Ω
abramdemski
4mo
Ω
17
259
Constellations are Younger than Continents
Jeffrey Heninger
4mo
22
128
The Dark Arts
lsusr
,
Lyrongolem
4mo
49
147
Discussion: Challenges with Unsupervised LLM Knowledge Discovery
Ω
Seb Farquhar
,
Vikrant Varma
,
zac_kenton
,
gasteigerjo
,
Vlad Mikulik
,
Rohin Shah
5mo
Ω
21
411
Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible
GeneSmith
,
kman
5mo
162
289
Speaking to Congressional staffers about AI risk
Akash
,
hath
2mo
23
156
How useful is mechanistic interpretability?
ryan_greenblatt
,
Neel Nanda
,
Buck
,
habryka
4mo
53
309
Shallow review of live agendas in alignment & safety
Ω
technicalities
,
Stag
5mo
Ω
69
138
Moral Reality Check (a short story)
jessicata
5mo
44
215
What are the results of more parental supervision and less outdoor play?
juliawise
5mo
30
282
Social Dark Matter
[DEACTIVATED] Duncan Sabien
5mo
112
256
AI Timelines
Ω
habryka
,
Daniel Kokotajlo
,
Ajeya Cotra
,
Ege Erdil
6mo
Ω
74
185
Thinking By The Clock
Screwtape
6mo
27
261
The 6D effect: When companies take risks, one email can be very powerful.
scasper
6mo
40
104
Deception Chess: Game #1
Zane
,
aphyer
,
Alex A
,
AdamYedidia
6mo
19
240
Book Review: Going Infinite
Zvi
6mo
109
238
Alignment Implications of LLM Successes: a Debate in One Act
Ω
Zack_M_Davis
6mo
Ω
50
157
Holly Elmore and Rob Miles dialogue on AI Safety Advocacy
jacobjacob
,
Robert Miles
,
Holly_Elmore
6mo
30
286
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Ω
Zac Hatfield-Dodds
6mo
Ω
21
169
Thomas Kwa's MIRI research experience
Thomas Kwa
,
peterbarnett
,
Vivek Hebbar
,
Jeremy Gillen
,
jacobjacob
,
Raemon
7mo
52
102
Cohabitive Games so Far
mako yass
7mo
116
326
Inside Views, Impostor Syndrome, and the Great LARP
johnswentworth
7mo
53