271

LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ
If Anyone Builds It, Everyone Dies

Nate and Eliezer have written a book making a detailed case for the risks from AI – in the hopes that it’s not too late to change course. You can buy the book now in print, eBook or audiobook form, as well as read through the 2-books worth of additional content in the online resources for the book.

Customize
Load More

Quick Takes

Popular Comments

A1987dM1d*4922
Was Barack Obama still serving as president in December?
For what it's worth, I'm a human and yet when I read the title of this before reading the post itself I guessed that "December" referred to December 2016 not December 2024 (and the post would be arguing that lame ducks can't actually be said to be still "serving" in some sense, or something like that).
habryka9h2816
How To Dress To Improve Your Epistemics
FWIW, this is generally true for design things. In web-design people tend to look for extremely simple surface-level rules (like "what is a good font?" and "what are good colors?") in ways that IMO tends to basically never work. Like, you can end up doing an OK job if you go with a design framework of not looking horrendous, but when you need to make adjustments, or establish a separate brand, there really are no rules at that level of abstraction.  I often get very frustrated responses when people come to me for design feedback and I respond with things like "well, I think this website is communicating that you are a kind of 90s CS professor? Is that what you want?" and then they respond with "I mean... what? I asked you whether this website looks 'good', what does this have to do with 90s CS professors? I just want you to give me a straight answer". And like, often I can make guesses about what their aims are with a website and try to translate things into a single "good" or "bad" scalar, but it usually just fails because I don't know what people are going for. IMO the same tends to be true for fashion.  Almost any piece of clothing you can buy will be the right choice in some context, or given some aim! If you are SBF then in order to signal your contrarian genius you maybe want to wear mildly ill-fitting tees with your company logo to your formal dinners. Signaling is complicated and messy and it's very hard to give hard and fast rules. In many cases the things people tend to ask here often feel to me about as confused as people saying "can you tell me how to say good things in conversations? Like, can someone just write down at a nuts-and-bolts level what makes for being good at talking to people?". Like, yes, of course there are skills related to conversations, but it centrally depends on what you are hoping to communicate in your conversations!
Buck1d324
I enjoyed most of IABIED
Why did I like the book so much more than I expected? I think it's a mix of: * I like the authors' writing on basic AI risk stuff but I don't like their writing on more in-the-weeds questions, and I run across their in-the-weeds writing much more in my day-to-day life, so it's surprisingly pleasant to read them writing intro materials. * Their presentation of the arguments were cleaner here than I've previously seen.
Load More
Load More
118
Obligated to Respond
Duncan Sabien (Inactive)
2d
58
285
How anticipatory cover-ups go wrong
Kaj_Sotala
7d
24
leogao6h4511
3
a thing i've noticed rat/autistic people do (including myself): one very easy way to trick our own calibration sensors is to add a bunch of caveats or considerations that make it feel like we've modeled all the uncertainty (or at least, more than other people who haven't). so one thing i see a lot is that people are self-aware that they have limitations, but then over-update on how much this awareness makes them calibrated. one telltale hint that i'm doing this myself is if i catch myself saying something because i want to demo my rigor and prove that i've considered some caveat that one might think i forgot to consider i've heard others make a similar critique about this as a communication style which can mislead non-rats who are not familiar with the style, but i'm making a different claim here that one can trick oneself. it seems that one often believes being self aware of a certain limitation is enough to correct for it sufficiently to at least be calibrated about how limited one is. a concrete example: part of being socially incompetent is not just being bad at taking social actions, but being bad at detecting social feedback on those actions. of course, many people are not even aware of the latter. but many are aware of and acknowledge the latter, and then act as if because they've acknowledged a potential failure mode and will try to be careful towards avoiding it, that they are much less susceptible to the failure mode than other people in an otherwise similar reference class. one variant of this deals with hypotheticals - because hypotheticals often can/will never be evaluated, this allows one to get the feeling that one is being epistemically virtuous and making falsifiable predictions, without ever actually getting falsified. for example, a statement "if X had happened, then i bet we would see Y now" has prediction vibes but is not actually a prediction. this is especially pernicious when one fails but says "i failed but i was close, so i should still
Jacob_Hilton2d6439
12
Superhuman math AI will plausibly arrive significantly before broad automation I think it's plausible that for several years in the late 2020s/early 2030s, we will have AI that is vastly superhuman at formal domains including math, but still underperforms humans at most white-collar jobs (and so world GDP growth remains below 10%/year, say – still enough room for AI to be extraordinarily productive compared to today). Of course, if there were to be an intelligence explosion on that timescale, then superhuman math AI would be unsurprising. My main point is that superhuman math AI still seems plausible even disregarding feedback loops from automation of AI R&D. On the flip side, a major catastrophe and/or coordinated slowdown could prevent both superhuman math AI and broad automation. Since both of these possibilities are widely discussed elsewhere, I will disregard both AI R&D feedback loops and catastrophe for the purposes of this forecast. (I think this is a very salient possibility on the relevant timescale, but won't justify that here.) My basic reasons for thinking vastly superhuman math AI is a serious possibility in the next 4–8 years (even absent AI R&D feedback loops and/or catastrophe): * Performance in formal domains is verifiable: math problems can be designed to have a unique correct answer, and formal proofs are either valid or invalid. Historically, in domains with cheap, automated supervision signals, only a relatively small amount of research effort has been required to produce superhuman AI (e.g., in board games and video games). There are often other bottlenecks than supervision, most notably exploration and curricula, but these tend to be more surmountable. * Recent historical progress in math has been extraordinarily fast: in the last 4 years, AI has gone from struggling with grade school math to achieving an IMO gold medal, with progress at times exceeding almost all forecasters' reasonable expectations. Indeed, much of this progress seems
Thane Ruthenis2d572
31
Just finished If Anyone Builds It, Everyone Dies (and some of the supplements).[1] It feels... weaker than I'd hoped. Specifically, I think Part 3 is strong, and the supplemental materials are quite thorough, but Parts 1-2... I hope I'm wrong, and this opinion is counterweighed by all these endorsements and MIRI presumably running it by lots of test readers. But I'm more bearish on it making a huge impact than I was before reading it. Point 1: The rhetoric – the arguments and their presentations – is often not novel, just rehearsed variations on the arguments Eliezer/MIRI already deployed. This is not necessarily a problem, if those arguments were already shaped into their optimal form, and I do like this form... But I note those arguments have so far failed to go viral. Would repackaging them into a book, and deploying it in our post-ChatGPT present, be enough? Well, I hope so. Point 2: I found Chapter 2 in particular somewhat poorly written in how it explains the technical details. Specifically, those explanations often occupy that unfortunate middle ground between "informal gloss" and "correct technical description" where I'd guess they're impenetrable both to non-technical readers and to technical readers unfamiliar with the subject matter. An example that seems particularly egregious to me: How does that conclusion follow? If a base model can only regurgitate human utterances, how is generating sixteen utterances and then reinforcing some of them leads to it... not regurgitating human utterances? This explanation is clearly incomplete. My model of a nonexpert technical-minded reader, who is actually tracking the gears the book introduces, definitely notices that and is confused. Explanation of base models' training at the start of the chapter feels flawed in the same way. E. g.: My model of a technical-minded reader is confused about how that whole thing is supposed to work. It sounds like AI developers manually pick billions of operations? What? The tec
AnnaSalamon4d14110
17
A WSJ article from today presents evidence that toxic fumes in airplane air are surprisingly common, are bad for health, have gotten much worse recently, and are still being deliberately covered up. Is anyone up for wading in for a couple hours and giving us an estimated number of micromorts / brain damage / [something]? I fly frequently and am wondering whether to fly less because of this (probably not, but worth a Fermi?); I imagine others might want to know too. (Also curious if some other demographics should be more concerned than I should be, eg people traveling with babies or while pregnant or while old, or people who travel more than X times/year (since the WSJ article says airline crew get hit harder by each subsequent exposure, more than linearly)). (The above link is a "gift article" that you should be able to read without a WSJ subscription, but I'm not sure how many viewers it'll allow; if you, Reader, would like a copy and the link has stopped working, tell me and I'll send you one.)
cloud1d290
0
Future research on subliminal learning that I'd be excited to see (credit to my coauthors): * Robustness to paraphrasing * Generally, clarifying cross-model transmission: when does it happen? * Connect subliminal learning to Linear Mode Connectivity (h/t Alex Dimakis) * Can subliminal learning occur when the base models had different inits but are trained to be similar? (Clarifies whether init is what matters) * Develop theory * Quantify transmission via random matrix theory (build off equation 2 in the paper). Are there nice relationships lurking there (like d_vocab : d_model)? * Can we get theory that covers the data filtering case? * Figure out what can and can’t be transmitted * Backdoor transmission * Information-theoretic limits * Dependence on tokenization * Subtle semantic transmission: what about cases that aren't subliminal learning but are very hard to detect? Connect to scalable oversight and/or control. * Adversarially-constructed subliminal learning datasets (no teacher) (compare with "clean label" data poisoning literature)
Load More (5/57)
32Meetup Month
Raemon
12h
2
412The Rise of Parasitic AI
Adele Lopez
7d
96
158I enjoyed most of IABIED
Buck
1d
16
86Christian homeschoolers in the year 3000
Buck
18h
24
467How Does A Blind Model See The Earth?
henry
1mo
38
348AI Induced Psychosis: A shallow investigation
Ω
Tim Hua
11d
Ω
43
53How To Dress To Improve Your Epistemics
johnswentworth
13h
20
100Was Barack Obama still serving as president in December?
Jan Betley
2d
11
61Stress Testing Deliberative Alignment for Anti-Scheming Training
Ω
Mikita Balesni, Bronson Schoen, Marius Hobbhahn, Axel Højmark, AlexMeinke, Teun van der Weij, Jérémy Scheurer, Felix Hofstätter, Nicholas Goldowsky-Dill, rusheb, Andrei Matveiakin, jenny, alex.lloyd
16h
Ω
0
158The Eldritch in the 21st century
PranavG, Gabriel Alfour
7d
31
243The Cats are On To Something
Hastings
16d
25
141High-level actions don’t screen off intent
AnnaSalamon
7d
14
418HPMOR: The (Probably) Untold Lore
Gretta Duleba, Eliezer Yudkowsky
2mo
152
Load MoreAdvanced Sorting/Filtering
ACX Meetup: Fall 2025
AISafety.com Reading Group session 327
[Today]Sydney – ACX Meetups Everywhere Fall 2025
[Today]Jos – ACX Meetups Everywhere Fall 2025
484Welcome to LessWrong!
Ruby, Raemon, RobertM, habryka
6y
74