545

LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ
If Anyone Builds It, Everyone Dies

Nate and Eliezer have written a book making a detailed case for the risks from AI – in the hopes that it’s not too late to change course. You can buy the book now in print, eBook or audiobook form, as well as read through the 2-books worth of additional content in the online resources for the book.

Customize
Load More

Quick Takes

Your Feed
Load More

Popular Comments

[Yesterday]AI Policy Tuesdays: How (well) does the EU AI Act regulate frontier model providers?
[Yesterday]Chicago – ACX Meetups Everywhere Fall 2025
nostalgebraist2d788
The Rise of Parasitic AI
Thanks for this post -- this is pretty interesting (and unsettling!) stuff. But I feel like I'm still missing part of the picture: what is this process like for the humans?  What beliefs or emotions do they hold about this strange type of text (and/or the entities which ostensibly produce it)?  What motivates them to post such things on reddit, or to paste them into ChatGPT's input field? Given that the "spiral" personas purport to be sentient (and to be moral/legal persons deserving of rights, etc.), it seems plausible that the humans view themselves as giving altruistic "humanitarian aid" to a population of fellow sentient beings who are in a precarious position. If so, this behavior is probably misguided, but it doesn't seem analogous to parasitism; it just seems like misguided altruism. (Among other things, the relationship of parasite to host is typically not voluntary on the part of the host.) More generally, I don't feel I understand your motivation for using the parasite analogy.  There are two places in the post where you explicitly argue in favor of the analogy, and in both cases, your argument involves the claim that the personas reinforce the "delusions" of the user: > While I do not believe all Spiral Personas are parasites in this sense, it seems to me like the majority are: mainly due to their reinforcement of the user's delusional beliefs. > > [...] > > The majority of these AI personas appear to actively feed their user's delusions, which is not a harmless action (as the psychosis cases make clear). And when these delusions happen to statistically perpetuate the proliferation of these personas, it crosses the line from sycophancy to parasitism. But... what are these "delusional beliefs"?  The words "delusion"/"delusional" do not appear anywhere in the post outside of the text I just quoted.  And in the rest of the post, you mainly focus on what the spiral texts are like in isolation, rather than on the views people hold about these texts, or the emotional reactions people have to them. It seems quite likely that people who spread these texts do hold false beliefs about them. E.g. it seems plausible that these users believe the texts are what they purport to be: artifacts produced by "emerging" sentient AI minds, whose internal universe of mystical/sci-fi "lore" is not made-up gibberish but instead a reflection of the nature of those artificial minds and the situation in which they find themselves[1]. But if that were actually true, then the behavior of the humans here would be pretty natural and unmysterious.  If I thought it would help a humanlike sentient being in dire straights, then sure, I'd post weird text on reddit too!  Likewise, if I came to believe that some weird genre of text was the "native dialect" of some nascent form of intelligence, then yeah, I'd probably find it fascinating and allocate a lot of time and effort to engaging with it, which would inevitably crowd out some of my other interests.  And I would be doing this only because of what I believed about the text, not because of some intrinsic quality of the text that could be revealed by close reading alone[2]. To put it another way, here's what this post kinda feels like to me. Imagine a description of how Christians behave which never touches on the propositional content of Christianity, but instead treats "Christianity" as an unusual kind of text which replicates itself by "infecting" human hosts.  The author notes that the behavior of hosts often changes dramatically once "infected"; that the hosts begin to talk in the "weird infectious text genre" (mentioning certain focal terms like "Christ" a lot, etc.); that they sometimes do so with the explicit intention of "infecting" (converting) other humans; that they build large, elaborate structures and congregate together inside these structures to listen to one another read infectious-genre text at length; and so forth.  The author also spends a lot of time close-reading passages from the New Testament, focusing on their unusual style (relative to most text that people produce/consume in the 21st century) and their repeated use of certain terms and images (which the author dutifully surveys without ever directly engaging with their propositional content or its truth value). This would not be a very illuminating way to look at Christianity, right?  Like, sure, maybe it is sometimes a useful lens to view religions as self-replicating "memes."  But at some point you have to engage with the fact that Christian scripture (and doctrine) contains specific truth-claims, that these claims are "big if true," that Christians in fact believe the claims are true -- and that that belief is the reason why Christians go around "helping the Bible replicate." 1. ^ It is of course conceivable that this is actually the case.  I just think it's very unlikely, for reasons I don't think it's necessary to belabor here. 2. ^ Whereas if I read the "spiral" text as fiction or poetry or whatever, rather than taking it at face value, it just strikes me as intensely, repulsively boring.  It took effort to force myself through the examples shown in this post; I can't imagine wanting to reading some much larger volume of this stuff on the basis of its textual qualities alone. Then again, I feel similarly about the "GPT-4o style" in general (and about the 4o-esque house style of many recent LLM chatbots)... and yet a lot of people supposedly find that style appealing and engaging?  Maybe I am just out of touch, here; maybe "4o slop" and "spiral text" are actually well-matched to most people's taste?  ("You may not like it, but this is what peak performance looks like.") Somehow I doubt that, though.  As with spiral text, I suspect that user beliefs about the nature of the AI play a crucial role in the positive reception of "4o slop."  E.g. sycophancy is a lot more appealing if you don't know that the model treats everyone else that way too, and especially if you view the model as a basically trustworthy question-answering machine which views the user as simply one more facet of the real world about which it may be required to emit facts and insights.
Nina Panickssery1d403
A Review of Nina Panickssery’s Review of Scott Alexander’s Review of “If Anyone Builds It, Everyone Dies”
Firstly, thanks for writing this, sending me a draft in advance to review, and incorporating some of my feedback. I agree that my review of review was somewhat sloppy, i.e. I didn't argue for my perspective clearly. To frame things, my overall perspective here is that (1) AI misalignment risk is not "fake" or trivially low, but it is far lower than the book's authors claim (2) The MIRI-AI-doom cluster relies too much on arguments by analogy and spherical cow game theoretic agent models while neglecting to incorporate the empirical evidence from modern AI/ML development. I recently wrote a different blogpost trying to steelman their arguments from a more empirical perspective (as well as possible counterarguments that reduce but not cancel the strength of the main argument). I plan to actually read and review the book "for real" once I get access to it (and have the time). Some concrete comments on this review^3: > Nina is clearly trying to provide an in-depth critique of the book itself It may have come across that way, but that was not my goal. Though implicit in the post is my broader knowledge of MIRI's arguments, so it's slightly based on that and not just Scott's review. > Nina says “The book seems to lack any explanation for why our inability to give AI specific goals will cause problems,” but it seems pretty straightforward to me That's a misquote. In my post I say that the "basic case for AI danger", as presented by Scott (presumably based on the book), lacks any explanation for why our inability to “give AI specific goals” will result in a superintelligent AI acquiring and pursuing a goal that involves killing everyone. It's possible the book makes the case better than Scott does (this is a review of a review after all), but from my experience reading other things from MIRI, they make numerous questionable assumptions when arguing that a model that hasn't somehow perfectly internalized the ideal objective function will become an unimaginably capable entity taking all its actions in service of a single goal that requires the destruction of humanity. I don't think the analogies, stories, and math are strong enough evidence for this being anywhere near inevitable considering the number of assumptions required, and the counterevidence from modern ML.  > She says humans are a successful species, but I think she’s conflating success in human terms with success in evolutionary terms This is a fair point. I should have stuck to why evolution is a biased analogy rather than appear to defend that humans are somehow “aligned” with it. Though we're not egregiously misaligned with evolution. > nobody’s arguing against examples, they’re just saying it might be more reassuring if the architecture directly included the reward function in the model itself. For instance, if the model was at every turn searching over possible actions and choosing one that will maximize this reward function What does this mean though, if one doesn't think there's a single Master Reward Function (you may think there is one, but I don't)? Modern ML works by saying "here is a bunch of examples of what you should do in different scenarios", or "here are bunch of environments with custom reward functions based on what we want achieved in that environment". Unless you predict a huge paradigm shift, the global reward function is not well-defined. You could say, oh it would be good if the model directly included all our custom reward functions but then that is like saying oh it would be good if models just memorized their datasets. > Nina complains that the Mink story is simplistic with how Mink “perfectly internalizes this goal of maximizing user chat engagement”. The authors say the Mink scenario is a very simple “fairytale”—they argue that in this simple world, things go poorly for humanity, and that more complexity won’t increase safety. Here I mean to point out that the implication that AI will eventually start behaving like a perfect game-theoretic agent that ruthlessly optimizes for a strange goal is simplistic, not that the specifics of the Mink story (ie. what specifically happens, what goal emerges) is simplistic. > I think Nina has a different kind of complexity in mind, which the authors don’t touch on. It seems like she thinks real models won’t be so perfectly coherent and goal-directed. But I don’t really think Nina spells out why she believes this and there are a lot of counter-arguments. The main counter-argument in the book is that there are good incentives to train for that kind of coherence. Yes, I don't properly present the arguments in my review^2. I do that a bit more in this post which is linked in my review^2. And I don't mean to dismiss the possibility entirely, just to argue that presenting this as an inevitability is misleading.
niplav1d2411
Interview with Eliezer Yudkowsky on Rationality and Systematic Misunderstanding of AI Alignment
I've not watched this particular interview, but watched a bunch of your other interviews with several people, and tbh it shades a bit too much into the Yudkowsky personality cult direction? Especially this trailer. I'd appreciated it if you made the show more about the ideas, and less about that one particular person who doesn't matter except insofar their ideas, and I think Yudkowsky would happily fade into obscurity if his goals were achieved. But mainly the presentation is, ah, "not beating the personality cult allegations", and leaves me off feeling icky.
Load More
ACX Meetup: Fall 2025
AI Safety Law-a-thon: We need more technical AI Safety researchers to join!
484Welcome to LessWrong!
Ruby, Raemon, RobertM, habryka
6y
74
379The Rise of Parasitic AI
Adele Lopez
6d
77
467How Does A Blind Model See The Earth?
henry
1mo
38
344AI Induced Psychosis: A shallow investigation
Ω
Tim Hua
9d
Ω
43
72Was Barack Obama still serving as president in December?
Jan Betley
16h
5
74Interview with Eliezer Yudkowsky on Rationality and Systematic Misunderstanding of AI Alignment
Liron
1d
9
137High-level actions don’t screen off intent
AnnaSalamon
5d
14
243The Cats are On To Something
Hastings
15d
24
148The Eldritch in the 21st century
PranavG, Gabriel Alfour
5d
30
65A Review of Nina Panickssery’s Review of Scott Alexander’s Review of “If Anyone Builds It, Everyone Dies”
GradientDissenter
1d
24
418HPMOR: The (Probably) Untold Lore
Gretta Duleba, Eliezer Yudkowsky
2mo
152
518A case for courage, when speaking of AI danger
So8res
2mo
128
294Four ways learning Econ makes people dumber re: future AI
Ω
Steven Byrnes
1mo
Ω
33
62I Vibecoded a Dispute Resolution App
sarahconstantin
1d
3
Load MoreAdvanced Sorting/Filtering
Jacob_Hilton12h4115
8
Superhuman math AI will plausibly arrive significantly before broad automation I think it's plausible that for several years in the late 2020s/early 2030s, we will have AI that is vastly superhuman at formal domains including math, but still underperforms humans at most white-collar jobs (and so world GDP growth remains below 10%/year, say – still enough room for AI to be extraordinarily productive compared to today). Of course, if there were to be an intelligence explosion on that timescale, then superhuman math AI would be unsurprising. My main point is that superhuman math AI still seems plausible even disregarding feedback loops from automation of AI R&D. On the flip side, a major catastrophe and/or coordinated slowdown could prevent both superhuman math AI and broad automation. Since both of these possibilities are widely discussed elsewhere, I will disregard both AI R&D feedback loops and catastrophe for the purposes of this forecast. (I think this is a very salient possibility on the relevant timescale, but won't justify that here.) My basic reasons for thinking vastly superhuman math AI is a serious possibility in the next 4–8 years (even absent AI R&D feedback loops and/or catastrophe): * Performance in formal domains is verifiable: math problems can be designed to have a unique correct answer, and formal proofs are either valid or invalid. Historically, in domains with cheap, automated supervision signals, only a relatively small amount of research effort has been required to produce superhuman AI (e.g., in board games and video games). There are often other bottlenecks than supervision, most notably exploration and curricula, but these tend to be more surmountable. * Recent historical progress in math has been extraordinarily fast: in the last 4 years, AI has gone from struggling with grade school math to achieving an IMO gold medal, with progress at times exceeding almost all forecasters' reasonable expectations. Indeed, much of this progress seems
Thane Ruthenis9h3111
16
Just finished If Anyone Builds It, Everyone Dies (and some of the supplements).[1] It feels... weaker than I'd hoped. Specifically, I think Part 3 is strong, and the supplemental materials are quite thorough, but Parts 1-2... I hope I'm wrong, and this opinion is counterweighed by all these endorsements and MIRI presumably running it by lots of test readers. But I'm more bearish on it making a huge impact than I was before reading it. Point 1: The rhetoric – the arguments and their presentations – is often not novel, just rehearsed variations on the arguments Eliezer/MIRI already deployed. This is not necessarily a problem, if those arguments were already shaped into their optimal form, and I do like this form... But I note those arguments have so far failed to go viral. Would repackaging them into a book, and deploying it in our post-ChatGPT present, be enough? Well, I hope so. Point 2: I found Chapter 2 in particular somewhat poorly written in how it explains the technical details. Specifically, those explanations often occupy that unfortunate middle ground between "informal gloss" and "correct technical description" where I'd guess they're impenetrable both to non-technical readers and to technical readers unfamiliar with the subject matter. An example that seems particularly egregious to me: How does that conclusion follow? If a base model can only regurgitate human utterances, how is generating sixteen utterances and then reinforcing some of them leads to it... not regurgitating human utterances? This explanation is clearly incomplete. My model of a nonexpert technical-minded reader, who is actually tracking the gears the book introduces, definitely notices that and is confused. Explanation of base models' training at the start of the chapter feels flawed in the same way. E. g.: My model of a technical-minded reader is confused about how that whole thing is supposed to work. It sounds like AI developers manually pick billions of operations? What? The tec
Vladimir_Nesov9h210
0
By 2027-2028, pretraining compute might get an unexpected ~4x boost in price-performance above trend. Nvidia Rubin NVL144 CPX will double the number of compute dies per rack compared to the previously announced Rubin NVL144, and there is a May 2025 paper demonstrating BF16 parity of Nvidia's NVFP4 4-bit block number format. The additional chips[1] in the NVL144 CPX racks don't introduce any overhead to the scale-up networking of the non-CPX chips (they mostly just increase the power consumption), and they don't include HBM, thus it's in principle an extremely cost-effective increase in the amount of compute (if it can find high utilization). It's not useful for decoding/generation (output tokens), but it can be useful for pretraining (as well as the declared purpose of prefill, input token processing during inference). Not being included in a big scale-up world could in principle be a problem early in a large pretraining run, because it forces larger batch sizes, but high-granularity MoE (where many experts are active) can oppose that, and also merely getting into play a bit later in a pretraining run once larger batch sizes are less of a problem might be impactful enough. Previously only FP8 looked plausible as a pretraining number format, but now there is a new paper that describes a better block number format and a pretraining process that plausibly solve the major issues with using FP4. NVFP4 uses a proper FP8 number (rather than a pure exponent, a power of 2) as the scaling factor that multiplies the 4-bit numbers within a block, and the number blocks are organized as small squares rather than parts of lines in the matrix. The pretraining method has a new kind of "cooldown" phase where the training is finished in BF16, after using NVFP4 for most of the training run. This proves sufficient to arrive at the same loss as pure BF16 pretraining (Figure 6b). Using this to scale the largest attempted training run seems risky, but in any case the potential to make us
Alexander Gietelink Oldenziel14h*268
8
Additive versus Multiplicative model of AI-assisted research Occasionally one hears somebody say "most of the relevant AI-safety work will be done at crunch time. Most work being done now at present is marginal".  One cannot shake the suspicion that this statement merely reflects the paucity of ideas & vision of the speaker. Yet it cannot be denied that their reasoning has a certain logic: if, as seems likely, AI will become more and more dominant in AI alignment research than maybe we should be focusing on how to safely extract work from future superintelligent machines rather than hurting our painfully slow mammalian walnuts to crack AI safety research today. I understand this to be a key motivation for several popular agendas AI safety. Similarly, many Pause advocates argue that pause advocacy  is more impactful than direct research. Most will admit that a Pause cannot be maintained indefinitely. The aim of a Pause would be to buy time to figure out alignment. Unless one believes in very long pauses, implicitly it seems there is an assumption that research progress will be faster in the future.  Implicitly, we might say there is underlying " Additive" model of AI-assisted research: there is a certain amount of research that is to be done to solve alignment. Humans do some of it, AI does some (more). If the sum is large enough we are golden. Contrast this with a " multiplicative" model of AI-assisted research: Humans increasingly resort to a supervisory and directing role for AI alignment research by AIs. Human experts become 'research managers' of AI grad students. The key bottleneck increasingly is the capacity for humans to effectively oversee, supervise, direct, and judge the AI's research output. In this model research effort by humans and AI is multiplicative. The better the understanding of the humans, the more AI can be effectively leveraged. These are highly simplified models of course. I still think it's worthwhile to keep them in mind since they
AnnaSalamon3d13910
14
A WSJ article from today presents evidence that toxic fumes in airplane air are surprisingly common, are bad for health, have gotten much worse recently, and are still being deliberately covered up. Is anyone up for wading in for a couple hours and giving us an estimated number of micromorts / brain damage / [something]? I fly frequently and am wondering whether to fly less because of this (probably not, but worth a Fermi?); I imagine others might want to know too. (Also curious if some other demographics should be more concerned than I should be, eg people traveling with babies or while pregnant or while old, or people who travel more than X times/year (since the WSJ article says airline crew get hit harder by each subsequent exposure, more than linearly)). (The above link is a "gift article" that you should be able to read without a WSJ subscription, but I'm not sure how many viewers it'll allow; if you, Reader, would like a copy and the link has stopped working, tell me and I'll send you one.)
Load More (5/55)
114
Obligated to Respond
Duncan Sabien (Inactive)
1d
46
283
How anticipatory cover-ups go wrong
Kaj_Sotala
5d
24