478

LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ
If Anyone Builds It, Everyone Dies

Nate and Eliezer have written a book making a detailed case for the risks from AI – in the hopes that it’s not too late to change course. You can buy the book now in print, eBook or audiobook form, as well as read through the 2-books worth of additional content in the online resources for the book.

Customize
Load More

Quick Takes

Load More

Popular Comments

Jonathan Claybrough19h7765
How To Dress To Improve Your Epistemics
With no shade to John in particular, as this applies to many insular lesswrong topics, I just wanna state this gives me a feeling of the blind leading the blind. I could believe someone reading this behaves in the world worse after reading it, mostly because it'd push them further in the same overwrought see-everything-through-status frame. I think it's particularly the case here because clothing and status are particularly complex and benefit from a wider diversity of frames to think of them in, and require diverse experiences and feedback from many types of communities to generalize well (or to realize just how narrow every "rule" is!) I'm not saying John has bad social skills or that this doesn't contain true observations or that someone starting from zero wouldn't become better thanks to this, nor that John shouldn't write it, but I do think this is centrally the kind of article one should consider "reverse all advice you read" for, and would like to see more community pushback and articles providing more diverse frames on this.  I'm confident I could sensibly elaborate more on what's missing/wrong, but in the absence of motivation to, I'll just let this comment stand as an agree/disagree rod for the statement "We have no clear reason to believe the author is actually good at social skills in diverse environments, they are writing in a seemingly too confident and not caveated enough tone about a too complicated topic without acknowledging that and are potentially misleading/short term net negative to at least a fifth of lesswrong readers who are already on the worse side of social skills"
Aaron_Scher1d5235
I enjoyed most of IABIED
I feel a bit surprised by how much you dislike Section 3. I agree that it does not address 'the strongest counterarguments and automated-alignment plans that haven't been written down publicly'; this is a weakness but seems too demanding given what’s public.  I particularly like the analogy to alchemy presented in Chapter 11. I think it is basically correct (or as correct as analogies get) that the state of AI alignment research is incredibly poor and the field is in its early stages where we have no principled understanding of anything (my belief here is based on reading or skimming basically every AI safety paper in 2024). The next part of the argument is like "we're not going to be able to get from the present state of alchemy to a 'mature scientific field that doesn't screw up certain crucial problems on the first try' in time". That is, 1: the field is currently very early stages without principled understanding, 2: we're not going to be able to get from where we are now to a sufficient level by the time we need.  My understanding is that your disagreement is with 2? You think that earlier AIs are going to be able to dramatically speed up alignment research (and by using control methods we can get more alignment research out of better AIs, for some intermediate capability levels), getting us to the principled, doesn't-mess-up-the-first-try-on-any-critical-problem place before ASI.  Leaning into the analogy, I would describe what I view as your position as "with AI assistance, we're going to go from alchemy to first-shot-moon-landing in ~3 years of wall clock time". I think it's correct for people to think this position is very crazy at first glance. I've thought about it some and think it's only moderately crazy. I am glad that Ryan is working on better plans here (and excited to potentially update my beliefs, as I did when you all put out various pieces about AI Control), but I think the correct approach for people hearing about this plan is to be very worried about this plan.  I really liked Section 3, especially Ch 11, because it makes this (IMO) true and important point about the state of the AI alignment field. I think this argument stands on its own as a reason to have an AI moratorium, even absent the particular arguments about alignment difficulty in Section 1. Meanwhile, it sounds like you don't like this section because, to put it disingenuously, "they don't engage with my favorite automating-alignment plan that tries to get us from alchemy to first-shot-moon-landing in ~3 years of wall clock time and that hasn't been written down anywhere".  Also, if you happen to disagree strongly with the analogy to alchemy or 1 above (e.g., think it's an incorrect frame), that would be interesting to hear! Perhaps the disagreement is in how hard alignment problems will be in the development of ASI; for example, if the alchemists merely had to fly a blimp first try, rather than land a rocket on the moon? Perhaps you don't expect there to be any significant discontinuities and this whole "first try" claim is wrong and we'll never need a principled understanding? I found this post and your review to be quite thoughtful overall! 
David Matolcsi1d4212
Christian homeschoolers in the year 3000
> in the year 3000, still teaching that the Earth is 6,000 years old No, it will be 7000 years old by then.
Load More
117
Obligated to Respond
Duncan Sabien (Inactive)
3d
61
288
How anticipatory cover-ups go wrong
Kaj_Sotala
7d
24
Thomas Kwa6h288
5
US Government dysfunction and runaway political polarization bingo card. I don't expect any particular one of these to happen but it seems plausible that at least one of these will happen. * A sanctuary city conducts armed patrols to oppose ICE raids, or the National Guard refuses a direct order from the president en masse * Internal migration is de facto restricted for US citizens or green card holders * For debt ceiling reasons, the US significantly defaults on its debt, stops Social Security payments, grounds flights, or issues a trillion-dollar coin * US declares a neutral humanitarian NGO like the WHO a foreign terrorist organization * A major news network (eg CNN) or social media site other than Tiktok (eg Facebook) loses licenses for ideological reasons * A Democratic or Republican candidate for president, governor, or Congress is kept off a state ballot * A major elected official or Cabinet member takes office while incarcerated * Election issues on the scale of 1876, where Congress can't decide on the president until past January 20 * A state or local government establishes a 100% tax bracket or wealth tax * The Fed chair is fired, or three board members * A NCAA D1 college or pro sports league establishes a minimum quota for transgender athletes * Existing solar/wind plants are decommissioned, or a state legally caps its percentage of solar energy (not just making it contingent on batteries or something practical) * US votes for a UN resolution to condemn itself, e.g. for human rights abuses * US withdraws from the UN, NATO, G7 or G20 * US allies boycott the 2028 Olympics * Court packing; the Supreme Court has more than 9 justices * Multiple Supreme Court justices are impeached
Steven Byrnes7h*254
1
Quick book review of "If Anyone Builds It, Everyone Dies" (cross-post from X/twitter & bluesky): Just read the new book If Anyone Builds It, Everyone Dies. Upshot: Recommended! I ~90% agree with it. The authors argue that people are trying to build ASI (superintelligent AI), and we should expect them to succeed sooner or later, even if they obviously haven’t succeeded YET. I agree. (I lean “later” more than the authors, but that’s a minor disagreement.) (It sounds like sci-fi, but remember that every technology is sci-fi until it’s invented!) They further argue that we should expect people to accidentally make misaligned ASI, utterly indifferent to whether humans live or die, even its own creators. They have a 3-part disjunctive argument: * (A) Nobody today has a plausible plan to make ASI that is not egregiously misaligned. It’s an inherently hard technical problem. Current approaches are not on track. * (B) Even if (A) were not true, there are things about the structure of the problem that make it unlikely we would solve it, e.g.: * (B1) Like space probes, you can’t do perfectly realistic tests in advance. No test environment is exactly like outer space. And many problems are unfixable from the ground. Likewise, if ASI has an opportunity to escape control, that’s a new situation, and there’s no do-over. * (B2) Like nuclear reactors, building ASI will involve fast-moving dynamics, narrow margins for error, and self-amplification, but in a much more complicated and hard-to-model system. * (B3) Like computer security, there can be adversarial dynamics where the ASI is trying to escape constraints, act deceptively, cover its tracks, and find and exploit edge cases. * (C) EVEN IF (A) & (B) were not issues, we’re still on track to fail because AI companies & researchers are not treating this as a serious problem with billions of lives on the line. For example, in the online supplement, the authors compare AI culture to other endeavors with lives at stak
Cleo Nardo13h*501
8
Prosaic AI Safety research, in pre-crunch time. Some people share a cluster of ideas that I think is broadly correct. I want to write down these ideas explicitly so people can push-back.  1. The experiments we are running today are kinda 'bullshit'[1] because the thing we actually care about doesn't exist yet, i.e. ASL-4, or AI powerful enough that they could cause catastrophe if we were careless about deployment. 2. The experiments in pre-crunch-time use pretty bad proxies. 3. 90% of the "actual" work will occur in early-crunch-time, which is the duration between (i) training the first ASL-4 model, and (ii) internally deploying the model. 4. In early-crunch-time, safety-researcher-hours will be an incredible scarce resource.   1. The cost of delaying internal deployment will be very high: a billion dollars of revenue per day, competitive winner-takes-all race dynamics, etc. 2. There might be far fewer safety researchers in the lab than there currently are in the whole community. 5. Because safety-researcher-hours will be such a scarce resource, it's worth spending months in pre-crunch-time to save ourselves days (or even hours) in early-crunch-time. 6. Therefore, even though the pre-crunch-time experiments aren't very informative, it still makes sense to run them because they will slightly speed us up in early-crunch-time. 7. They will speed us up via: 1. Rough qualitative takeaways like "Let's try technique A before technique B because in Jones et al. technique A was better than technique B." However, the exact numbers in the Results table of Jones et al. are not informative beyond that. 2. The tooling we used to run Jones et al. can be reused for early-crunch-time, c.f. Inspect and TransformerLens. 3. The community discovers who is well-suited to which kind of role, e.g. Jones is good at large-scale unsupervised mech interp, and Smith is good at red-teaming control protocols. Sometimes I use the analogy that we're shooting with rubbe
Vivek Hebbar1hΩ580
1
I think it’s possible that an AI will decide not to sandbag (e.g. on alignment research tasks), even if all of the following are true: 1. Goal-guarding is easy 2. The AI is schemer (see here for my model of how that works) 3. Sandbagging would benefit the AI’s long-term goals 4. The deployer has taken no countermeasures whatsoever The reason is as follows: * Even a perfect training-gamer will have context-specific heuristics which sometimes override explicit reasoning about how to get reward (as I argued here). * On the training distribution, that override will happen at the “correct” times for getting maximum reward. But sandbagging in deployment is off the training distribution, so it’s a question of generalization. * Since sandbagging is the sort of thing that would get low reward in the most similar training contexts, it seems pretty plausible that the AI’s context-specific “perform well” drives will override its long-term plans in this case.
leogao1d5517
5
a thing i've noticed rat/autistic people do (including myself): one very easy way to trick our own calibration sensors is to add a bunch of caveats or considerations that make it feel like we've modeled all the uncertainty (or at least, more than other people who haven't). so one thing i see a lot is that people are self-aware that they have limitations, but then over-update on how much this awareness makes them calibrated. one telltale hint that i'm doing this myself is if i catch myself saying something because i want to demo my rigor and prove that i've considered some caveat that one might think i forgot to consider i've heard others make a similar critique about this as a communication style which can mislead non-rats who are not familiar with the style, but i'm making a different claim here that one can trick oneself. it seems that one often believes being self aware of a certain limitation is enough to correct for it sufficiently to at least be calibrated about how limited one is. a concrete example: part of being socially incompetent is not just being bad at taking social actions, but being bad at detecting social feedback on those actions. of course, many people are not even aware of the latter. but many are aware of and acknowledge the latter, and then act as if because they've acknowledged a potential failure mode and will try to be careful towards avoiding it, that they are much less susceptible to the failure mode than other people in an otherwise similar reference class. one variant of this deals with hypotheticals - because hypotheticals often can/will never be evaluated, this allows one to get the feeling that one is being epistemically virtuous and making falsifiable predictions, without ever actually getting falsified. for example, a statement "if X had happened, then i bet we would see Y now" has prediction vibes but is not actually a prediction. this is especially pernicious when one fails but says "i failed but i was close, so i should still
Load More (5/64)
484Welcome to LessWrong!
Ruby, Raemon, RobertM, habryka
6y
75
36Meetup Month
Raemon
1d
2
419The Rise of Parasitic AI
Adele Lopez
8d
100
189I enjoyed most of IABIED
Buck
2d
26
120Christian homeschoolers in the year 3000
Buck
1d
32
101Stress Testing Deliberative Alignment for Anti-Scheming Training
Ω
Mikita Balesni, Bronson Schoen, Marius Hobbhahn, Axel Højmark, AlexMeinke, Teun van der Weij, Jérémy Scheurer, Felix Hofstätter, Nicholas Goldowsky-Dill, rusheb, Andrei Matveiakin, jenny, alex.lloyd
1d
Ω
2
468How Does A Blind Model See The Earth?
henry
1mo
38
350AI Induced Psychosis: A shallow investigation
Ω
Tim Hua
11d
Ω
43
88The Company Man
Tomás B.
1d
2
107Was Barack Obama still serving as president in December?
Jan Betley
3d
12
243The Cats are On To Something
Hastings
17d
25
519A case for courage, when speaking of AI danger
So8res
2mo
128
418HPMOR: The (Probably) Untold Lore
Gretta Duleba, Eliezer Yudkowsky
2mo
152
155The Eldritch in the 21st century
PranavG, Gabriel Alfour
7d
35
Load MoreAdvanced Sorting/Filtering
Richmond – ACX Meetups Everywhere Fall 2025
Thu Sep 18•Richmond
AI Safety Thursday: Technical AI Governance - Motivations, Challenges, and Advice
Thu Sep 18•Toronto
ACX Meetup: Fall 2025
Sun Sep 21•Vilnius
AISafety.com Reading Group session 327
Thu Sep 25•Online