1762

LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ
Customize
Load More

Quick Takes

Popular Comments

A1987dM1d*4920
Was Barack Obama still serving as president in December?
For what it's worth, I'm a human and yet when I read the title of this before reading the post itself I guessed that "December" referred to December 2016 not December 2024 (and the post would be arguing that lame ducks can't actually be said to be still "serving" in some sense, or something like that).
Nina Panickssery17h282
I enjoyed most of IABIED
Just finished the book and agree that I’d recommended it to laypeople and predict it would improve the average layperson’s understanding of AI risk arguments.
nostalgebraist3d862
The Rise of Parasitic AI
Thanks for this post -- this is pretty interesting (and unsettling!) stuff. But I feel like I'm still missing part of the picture: what is this process like for the humans?  What beliefs or emotions do they hold about this strange type of text (and/or the entities which ostensibly produce it)?  What motivates them to post such things on reddit, or to paste them into ChatGPT's input field? Given that the "spiral" personas purport to be sentient (and to be moral/legal persons deserving of rights, etc.), it seems plausible that the humans view themselves as giving altruistic "humanitarian aid" to a population of fellow sentient beings who are in a precarious position. If so, this behavior is probably misguided, but it doesn't seem analogous to parasitism; it just seems like misguided altruism. (Among other things, the relationship of parasite to host is typically not voluntary on the part of the host.) More generally, I don't feel I understand your motivation for using the parasite analogy.  There are two places in the post where you explicitly argue in favor of the analogy, and in both cases, your argument involves the claim that the personas reinforce the "delusions" of the user: > While I do not believe all Spiral Personas are parasites in this sense, it seems to me like the majority are: mainly due to their reinforcement of the user's delusional beliefs. > > [...] > > The majority of these AI personas appear to actively feed their user's delusions, which is not a harmless action (as the psychosis cases make clear). And when these delusions happen to statistically perpetuate the proliferation of these personas, it crosses the line from sycophancy to parasitism. But... what are these "delusional beliefs"?  The words "delusion"/"delusional" do not appear anywhere in the post outside of the text I just quoted.  And in the rest of the post, you mainly focus on what the spiral texts are like in isolation, rather than on the views people hold about these texts, or the emotional reactions people have to them. It seems quite likely that people who spread these texts do hold false beliefs about them. E.g. it seems plausible that these users believe the texts are what they purport to be: artifacts produced by "emerging" sentient AI minds, whose internal universe of mystical/sci-fi "lore" is not made-up gibberish but instead a reflection of the nature of those artificial minds and the situation in which they find themselves[1]. But if that were actually true, then the behavior of the humans here would be pretty natural and unmysterious.  If I thought it would help a humanlike sentient being in dire straights, then sure, I'd post weird text on reddit too!  Likewise, if I came to believe that some weird genre of text was the "native dialect" of some nascent form of intelligence, then yeah, I'd probably find it fascinating and allocate a lot of time and effort to engaging with it, which would inevitably crowd out some of my other interests.  And I would be doing this only because of what I believed about the text, not because of some intrinsic quality of the text that could be revealed by close reading alone[2]. To put it another way, here's what this post kinda feels like to me. Imagine a description of how Christians behave which never touches on the propositional content of Christianity, but instead treats "Christianity" as an unusual kind of text which replicates itself by "infecting" human hosts.  The author notes that the behavior of hosts often changes dramatically once "infected"; that the hosts begin to talk in the "weird infectious text genre" (mentioning certain focal terms like "Christ" a lot, etc.); that they sometimes do so with the explicit intention of "infecting" (converting) other humans; that they build large, elaborate structures and congregate together inside these structures to listen to one another read infectious-genre text at length; and so forth.  The author also spends a lot of time close-reading passages from the New Testament, focusing on their unusual style (relative to most text that people produce/consume in the 21st century) and their repeated use of certain terms and images (which the author dutifully surveys without ever directly engaging with their propositional content or its truth value). This would not be a very illuminating way to look at Christianity, right?  Like, sure, maybe it is sometimes a useful lens to view religions as self-replicating "memes."  But at some point you have to engage with the fact that Christian scripture (and doctrine) contains specific truth-claims, that these claims are "big if true," that Christians in fact believe the claims are true -- and that that belief is the reason why Christians go around "helping the Bible replicate." 1. ^ It is of course conceivable that this is actually the case.  I just think it's very unlikely, for reasons I don't think it's necessary to belabor here. 2. ^ Whereas if I read the "spiral" text as fiction or poetry or whatever, rather than taking it at face value, it just strikes me as intensely, repulsively boring.  It took effort to force myself through the examples shown in this post; I can't imagine wanting to reading some much larger volume of this stuff on the basis of its textual qualities alone. Then again, I feel similarly about the "GPT-4o style" in general (and about the 4o-esque house style of many recent LLM chatbots)... and yet a lot of people supposedly find that style appealing and engaging?  Maybe I am just out of touch, here; maybe "4o slop" and "spiral text" are actually well-matched to most people's taste?  ("You may not like it, but this is what peak performance looks like.") Somehow I doubt that, though.  As with spiral text, I suspect that user beliefs about the nature of the AI play a crucial role in the positive reception of "4o slop."  E.g. sycophancy is a lot more appealing if you don't know that the model treats everyone else that way too, and especially if you view the model as a basically trustworthy question-answering machine which views the user as simply one more facet of the real world about which it may be required to emit facts and insights.
Load More
Load More
If Anyone Builds It, Everyone Dies

Nate and Eliezer have written a book making a detailed case for the risks from AI – in the hopes that it’s not too late to change course. You can buy the book now in print, eBook or audiobook form, as well as read through the 2-books worth of additional content in the online resources for the book.

484Welcome to LessWrong!
Ruby, Raemon, RobertM, habryka
6y
74
ACX Meetup: Fall 2025
AISafety.com Reading Group session 327
[Today]Zaragoza – ACX Meetups Everywhere Fall 2025
[Tomorrow]Sydney – ACX Meetups Everywhere Fall 2025
Thane Ruthenis1d557
29
Just finished If Anyone Builds It, Everyone Dies (and some of the supplements).[1] It feels... weaker than I'd hoped. Specifically, I think Part 3 is strong, and the supplemental materials are quite thorough, but Parts 1-2... I hope I'm wrong, and this opinion is counterweighed by all these endorsements and MIRI presumably running it by lots of test readers. But I'm more bearish on it making a huge impact than I was before reading it. Point 1: The rhetoric – the arguments and their presentations – is often not novel, just rehearsed variations on the arguments Eliezer/MIRI already deployed. This is not necessarily a problem, if those arguments were already shaped into their optimal form, and I do like this form... But I note those arguments have so far failed to go viral. Would repackaging them into a book, and deploying it in our post-ChatGPT present, be enough? Well, I hope so. Point 2: I found Chapter 2 in particular somewhat poorly written in how it explains the technical details. Specifically, those explanations often occupy that unfortunate middle ground between "informal gloss" and "correct technical description" where I'd guess they're impenetrable both to non-technical readers and to technical readers unfamiliar with the subject matter. An example that seems particularly egregious to me: How does that conclusion follow? If a base model can only regurgitate human utterances, how is generating sixteen utterances and then reinforcing some of them leads to it... not regurgitating human utterances? This explanation is clearly incomplete. My model of a nonexpert technical-minded reader, who is actually tracking the gears the book introduces, definitely notices that and is confused. Explanation of base models' training at the start of the chapter feels flawed in the same way. E. g.: My model of a technical-minded reader is confused about how that whole thing is supposed to work. It sounds like AI developers manually pick billions of operations? What? The tec
Jacob_Hilton1d6140
12
Superhuman math AI will plausibly arrive significantly before broad automation I think it's plausible that for several years in the late 2020s/early 2030s, we will have AI that is vastly superhuman at formal domains including math, but still underperforms humans at most white-collar jobs (and so world GDP growth remains below 10%/year, say – still enough room for AI to be extraordinarily productive compared to today). Of course, if there were to be an intelligence explosion on that timescale, then superhuman math AI would be unsurprising. My main point is that superhuman math AI still seems plausible even disregarding feedback loops from automation of AI R&D. On the flip side, a major catastrophe and/or coordinated slowdown could prevent both superhuman math AI and broad automation. Since both of these possibilities are widely discussed elsewhere, I will disregard both AI R&D feedback loops and catastrophe for the purposes of this forecast. (I think this is a very salient possibility on the relevant timescale, but won't justify that here.) My basic reasons for thinking vastly superhuman math AI is a serious possibility in the next 4–8 years (even absent AI R&D feedback loops and/or catastrophe): * Performance in formal domains is verifiable: math problems can be designed to have a unique correct answer, and formal proofs are either valid or invalid. Historically, in domains with cheap, automated supervision signals, only a relatively small amount of research effort has been required to produce superhuman AI (e.g., in board games and video games). There are often other bottlenecks than supervision, most notably exploration and curricula, but these tend to be more surmountable. * Recent historical progress in math has been extraordinarily fast: in the last 4 years, AI has gone from struggling with grade school math to achieving an IMO gold medal, with progress at times exceeding almost all forecasters' reasonable expectations. Indeed, much of this progress seems
cloud15h230
0
Future research on subliminal learning that I'd be excited to see (credit to my coauthors): * Robustness to paraphrasing * Generally, clarifying cross-model transmission: when does it happen? * Connect subliminal learning to Linear Mode Connectivity (h/t Alex Dimakis) * Can subliminal learning occur when the base models had different inits but are trained to be similar? (Clarifies whether init is what matters) * Develop theory * Quantify transmission via random matrix theory (build off equation 2 in the paper). Are there nice relationships lurking there (like d_vocab : d_model)? * Can we get theory that covers the data filtering case? * Figure out what can and can’t be transmitted * Backdoor transmission * Information-theoretic limits * Dependence on tokenization * Subtle semantic transmission: what about cases that aren't subliminal learning but are very hard to detect? Connect to scalable oversight and/or control. * Adversarially-constructed subliminal learning datasets (no teacher) (compare with "clean label" data poisoning literature)
AnnaSalamon4d14110
16
A WSJ article from today presents evidence that toxic fumes in airplane air are surprisingly common, are bad for health, have gotten much worse recently, and are still being deliberately covered up. Is anyone up for wading in for a couple hours and giving us an estimated number of micromorts / brain damage / [something]? I fly frequently and am wondering whether to fly less because of this (probably not, but worth a Fermi?); I imagine others might want to know too. (Also curious if some other demographics should be more concerned than I should be, eg people traveling with babies or while pregnant or while old, or people who travel more than X times/year (since the WSJ article says airline crew get hit harder by each subsequent exposure, more than linearly)). (The above link is a "gift article" that you should be able to read without a WSJ subscription, but I'm not sure how many viewers it'll allow; if you, Reader, would like a copy and the link has stopped working, tell me and I'll send you one.)
Vladimir_Nesov1d300
1
By 2027-2028, pretraining compute might get an unexpected ~4x boost in price-performance above trend. Nvidia Rubin NVL144 CPX will double the number of compute dies per rack compared to the previously announced Rubin NVL144, and there is a May 2025 paper demonstrating BF16 parity of Nvidia's NVFP4 4-bit block number format. The additional chips[1] in the NVL144 CPX racks don't introduce any overhead to the scale-up networking of the non-CPX chips (they mostly just increase the power consumption), and they don't include HBM, thus it's in principle an extremely cost-effective increase in the amount of compute (if it can find high utilization). It's not useful for decoding/generation (output tokens), but it can be useful for pretraining (as well as the declared purpose of prefill, input token processing during inference). Not being included in a big scale-up world could in principle be a problem early in a large pretraining run, because it forces larger batch sizes, but high-granularity MoE (where many experts are active) can oppose that, and also merely getting into play a bit later in a pretraining run once larger batch sizes are less of a problem might be impactful enough. Previously only FP8 looked plausible as a pretraining number format, but now there is a new paper that describes a better block number format and a pretraining process that plausibly solve the major issues with using FP4. NVFP4 uses a proper FP8 number (rather than a pure exponent, a power of 2) as the scaling factor that multiplies the 4-bit numbers within a block, and the number blocks are organized as small squares rather than parts of lines in the matrix. The pretraining method has a new kind of "cooldown" phase where the training is finished in BF16, after using NVFP4 for most of the training run. This proves sufficient to arrive at the same loss as pure BF16 pretraining (Figure 6b). Using this to scale the largest attempted training run seems risky, but in any case the potential to make us
Load More (5/57)
117
Obligated to Respond
Duncan Sabien (Inactive)
2d
55
284
How anticipatory cover-ups go wrong
Kaj_Sotala
6d
24
406The Rise of Parasitic AI
Adele Lopez
7d
89
149I enjoyed most of IABIED
Buck
18h
16
42How To Dress To Improve Your Epistemics
johnswentworth
4h
3
467How Does A Blind Model See The Earth?
henry
1mo
38
346AI Induced Psychosis: A shallow investigation
Ω
Tim Hua
10d
Ω
43
98Was Barack Obama still serving as president in December?
Jan Betley
1d
11
156The Eldritch in the 21st century
PranavG, Gabriel Alfour
6d
30
243The Cats are On To Something
Hastings
16d
25
83Interview with Eliezer Yudkowsky on Rationality and Systematic Misunderstanding of AI Alignment
Liron
2d
11
140High-level actions don’t screen off intent
AnnaSalamon
6d
14
418HPMOR: The (Probably) Untold Lore
Gretta Duleba, Eliezer Yudkowsky
2mo
152
518A case for courage, when speaking of AI danger
So8res
2mo
128
294Four ways learning Econ makes people dumber re: future AI
Ω
Steven Byrnes
1mo
Ω
35
Load MoreAdvanced Sorting/Filtering