LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ

Quick Takes

1189
ACX Meetup: Fall 2025
AISafety.com Reading Group session 327
[Today]ACX Fall Meetups Everywhere
[Today]Ubud – ACX Meetups Everywhere Fall 2025
Raj Thimmiah's Shortform
Raj Thimmiah37m110

For an event me and Saul Munn are hosting:

Memoria is a one-day festival/unconference for spaced repetition, incremental reading, and memory systems. It’s hosted at Lighthaven in Berkeley, CA, on September 21st, from 10am through the afternoon/evening.

Michael Nielsen, Andy Matuschak, Soren Bjornstad, Martin Schneider, and about 70–90 others will be there — if you use & tinker with memory systems like Anki, SuperMemo, Remnote, MathAcademy, etc, then maybe you should come!

Tickets are $8... (read more)

Reply
Felix Moses's Shortform
Felix Moses11h110

What would the millennium problems of AI Safety be?

Reply
Felix Moses1h10

By that I mean: problems it would be worth one million dollars to have solved. That also implies they would have to be solvable in some empirical fashion.

Reply
Thomas Kwa's Shortform
Thomas Kwa10h2812

US Government dysfunction and runaway political polarization bingo card. I don't expect any particular one of these to happen but it seems plausible that at least one of these will happen.

  • A sanctuary city conducts armed patrols to oppose ICE raids, or the National Guard refuses a direct order from the president en masse
  • Internal migration is de facto restricted for US citizens or green card holders
  • For debt ceiling reasons, the US significantly defaults on its debt, stops Social Security payments, grounds flights, or issues a trillion-dollar coin
  • US declares
... (read more)
Reply
Showing 3 of 6 replies (Click to show all)
3Nathaniel6h
Issuing a trillion-dollar coin doesn't seem nearly as bad as any of the others in its bullet. Isn't it just an accounting gimmick roughly equivalent to raising the debt ceiling by $1 trillion?
Thomas Kwa3h42

The idea is they're printing money, not just borrowing it, which in the extreme would cause hyperinflation (and is equivalent to default since debt is in nominal dollars). It probably seems less bad though.

Reply
1Karl Krueger6h
My understanding is that CNN doesn't have or need "licenses", because it is a cable news network. Broadcast licenses from the FCC are for broadcast stations.
Vivek Hebbar's Shortform
Vivek Hebbar5h*Ω7102

I think it’s possible that an AI will decide not to sandbag (e.g. on alignment research tasks), even if all of the following are true:

  1. Goal-guarding is easy
  2. The AI is a schemer (see here for my model of how that works)
  3. Sandbagging would benefit the AI’s long-term goals
  4. The deployer has taken no countermeasures whatsoever

The reason is as follows:

  • Even a perfect training-gamer will have context-specific heuristics which sometimes override explicit reasoning about how to get reward (as I argued here).
  • On the training distribution, that override will happen at the “
... (read more)
Reply
Garrett Baker5h42

This seems less likely the harder the problem is, and therefore the more the AI needs to use its general intelligence or agency to pursue it, which are often the sorts of tasks we’re most scared about the AI doing surprisingly well on.

I agree this argument suggests we will have a good understanding of more simple capabilities the model has, like what facts about biology it knows about, which may end up being useful anyway.

Reply
jdp's Shortform
jdp3d2-11

I can see why they had to ban Said Achmiz before this book dropped.

Reply2
Showing 3 of 12 replies (Click to show all)
2the gears to ascension2d
I am dismayed but not surprised, given the authors. I'd love to see the version edited by JDP's mind(s) and their tools. I'm almost certain it would be out of anyone's price range, but what would it cost to buy JDP+AI hours sufficient to produce an edited version? I also have been trying to communicate it better, from the perspective of someone who actually put in the hours watching the arxiv feed. I suspect you'd do it better than I would. But, some ingredients I'd hope to see you ingest[ed already] for use: https://www.lesswrong.com/posts/9kNxhKWvixtKW5anS/you-are-not-measuring-what-you-think-you-are-measuring https://www.lesswrong.com/posts/gebzzEwn2TaA6rGkc/deep-learning-systems-are-not-less-interpretable-than-logic https://www.lesswrong.com/posts/Rrt7uPJ8r3sYuLrXo/selection-has-a-quality-ceiling probably some other wentworth stuff I thought I had more to link but it's not quite coming to mind. oh right, this one! https://www.lesswrong.com/posts/evYne4Xx7L9J96BHW/video-and-transcript-of-talk-on-can-goodness-compete
jdp5h60

I have now written a review of the book, which touches on some of what you're asking about. https://www.lesswrong.com/posts/mztwygscvCKDLYGk8/jdp-reviews-iabied

Reply2
2Malo2d
Obviously just one example, but Schneier has generally been quite skeptical, and he blurbed the book.
Daniel Kokotajlo's Shortform
Daniel Kokotajlo3d*Ω12261

I read somewhere recently that there's a fiber optic fpv kamikaze drone with a 40km range. By contrast typical such drones have 10km, maybe 20km ranges.

Either way, it seems clear that EW-resistant drones with 20km+ range are on the horizon. Millions per year will be produced by Ukraine and Russia and maybe some other countries. And I wouldn't be surprised if ranges go up to more than 40km soon.

I wonder if this will cause major problems for Israel. Gaza and Lebanon and Syria are within 20km of some decent-sized israeli cities. Iron Dome wouldn't work agains... (read more)

Reply
Showing 3 of 34 replies (Click to show all)
1AviS15h
It certainly seems like Israel is acting in a manner consistent with taking this concern seriously, seeming intent on ending Hamas's presence in Gaza, holding a buffer zone in Syria, and weakening Hezbollah.
2Daniel Kokotajlo16h
Could be AI, human pilots, or a combination of both, the basic math doesn't change. (Even if every drone needs a human pilot, it would totally be feasible to concentrate hundreds or thousands of fiber optic drones on a position.) The time when drones don't need human operators for most of the time in combat may be sooner than you think. Consider how Waymos operate autonomously but can call in a human operator to take over if they get stuck. I imagine something similar could happen for drones, where e.g. a group of N drones fly in a flock/swarm from point A to point B fully autonomously, and when they are approaching the target a human operator looking through their cameras paints the targets (yes, that's a soldier, yes, that's a tank, yes, you are clear to engage) and the AI does the rest. IIRC even more autonomy than that is already being trialed, I think I heard about a prototype that goes into some sort of 'autonomous seek and destroy mode' where it just roams around completely disconnected from its human operators and attacks any targets it recognizes. I agree that lots of other things about war will change due to AI, but the importance of drones, I think, is not going to go down thanks to AI.
RHollerith5h20

the importance of drones, I think, is not going to go down thanks to AI.

I agree. What I tried to say though was that my guess is that for drones to stay as effective as they currently are for 5 years would require AI capable enough that it would transform so many aspects of society that for us in 2025 to try to project out that far becomes futile.

Reply
don't_wanna_be_stupid_any_more's Shortform
don't_wanna_be_stupid_any_more10h2-7

can someone please explain to me why "if anyone builds it, everyone dies" is not a free eBook/blog post?

like seriously if someone told me there is a detailed case for a possible imminent existential risk with possible solutions included but i had to pay to see it, i would have dismissed it as another fearmongering doomsday grift.

if you are really sincere about an extinction level risks why hide your arguments behind a paywall? why not make it free so as many people can see it as possible?

the very fact that this book has a price tag on it in an age where publishing an eBook is practically free puts the authors motives in question.

Reply
Showing 3 of 5 replies (Click to show all)
3habryka6h
I don't think this is true. AI 2027 got much more attention than basically anything on the NYT bestseller list, and seems reasonably-well described as a "free eBook/blog post".  (IMO I think publishing a book is reasonable, but I think trying to write an online compendium to AI risk would have been a better move and been more successful. Nobody actually reads books, and books perform very badly in the modern social-media dominated media landscape)
Ben Pace5h20

One idea I came up with today, is that the ideal book would also have an online website where you can read it all, conditional on you having bought a book. Essentially a paywall that is also a measurable book sale. And otherwise you can just get highlighted extracts from each chapter.

Reply
3Eli Tyre8h
Yes. It's approximately the whole point. The authors have already produced massive amounts of free online content raising the alarm about AI risk. Those materials have had substantial impact, persuading the type of person who tends to read and be interested in long blog posts, of that kind. But that is a limited audience.  The point of publishing a proper book is precisely to reach a larger audience, and to shift the overton window of what's views are known to be respectable.
Thane Ruthenis's Shortform
Thane Ruthenis2d571

Just finished If Anyone Builds It, Everyone Dies (and some of the supplements).[1] It feels... weaker than I'd hoped. Specifically, I think Part 3 is strong, and the supplemental materials are quite thorough, but Parts 1-2... I hope I'm wrong, and this opinion is counterweighed by all these endorsements and MIRI presumably running it by lots of test readers. But I'm more bearish on it making a huge impact than I was before reading it.

Point 1: The rhetoric – the arguments and their presentations – is often not novel, just rehearsed variations on the ar... (read more)

Reply211
Showing 3 of 35 replies (Click to show all)
Darren McKee6h30

Also, I just posted my review: IABIED Review - An Unfortunate Miss — LessWrong

Reply
3Darren McKee18h
Good to know and I appreciate you sharing that exchange.  You are correct that such a thing is not in there... because (if you're curious) I thought, strategically, it was better to argue for what is desirable (safe AI innovation) than to argue for a negative (stop it all). Of course, if one makes the requirements for safe AI innovation strong enough, it may result in a slowing or restricting of developments. 
2Mateusz Bagiński12h
On the one hand, yeah, it might. On the other (IMO bigger) hand, the fewer people talk about the thing explicitly, the less likely it is to be included in the Overton windows and less likely it is to seem like a reasonable/socialy acceptable goal to aim for directly. I don't think the case for safe nuclear/biotechnology would be less persuasive if paired with "let's just get rid of nuclear weapons/bioweapons/gain of function research".
yams's Shortform
yams3mo10

Thinking about qualia, trying to avoid getting trapped in the hard problem of consciousness along the way.

Tempted to model qualia as a region with the capacity to populate itself with coarse heuristics for difficult-to-compute features of nodes in a search process, which happens to ship with a bunch of computational inconveniences (that are most of what we mean to refer to when we reference qualia).

This aids in generality, but trades off against locally optimal processes, as a kind of 'tax' on all cognition.

This is a literal shower thought and I've read no... (read more)

Reply1
yams6h20

Following up to say that the thing that maps most closely to what I was thinking about (or satisfied my curiosity) is GWT.

GWT is usually intended to approach the hard problem, but the principle critique of it is that it isn't doing that at all (I ~agree). Unfortunately, I had dozens of frustrating conversations with people telling me 'don't spend any time thinking about consciousness; it's a dead end; you're talking about the hard problem; that triggers me; STOP' before someone actually pointed me in the right direction here, or seemed open to the question at all.

Reply
1Isopropylpod3mo
Is qualia (it's existence or not, how and why it happens) not the exact thing the hard problem is about? If you're ignoring the hard problem or dismiss it you also doubt the existence of qualia.
1yams3mo
I guess I should have said ‘without getting caught in the nearby attractors associated with most conversations about the hard problem of consciousness’. There’s obviously a lot there, and my guess is >95 percent of it wouldn’t feel to me like it has little meaningful surface area with what I’m curious about.
yams's Shortform
yams7h20

Reading so many reviews/responses to IABIED, I wish more people had registered how they expected to feel about the book, or how they think a book on x-risk ought to look, prior to the book's release. 

Finalizing any Real Actual Object requires making tradeoffs. I think it's pretty easy to critique the book on a level of abstraction that respects what it is Trying To Be in only the broadest possible terms, rather than acknowledging various sub-goals (e.g. providing an updated version of Nate + Eliezer's now very old 'canonical' arguments), modulations o... (read more)

Reply
steve2152's Shortform
Steven Byrnes10h*306

Quick book review of "If Anyone Builds It, Everyone Dies" (cross-post from X/twitter & bluesky):

Just read the new book If Anyone Builds It, Everyone Dies. Upshot: Recommended! I ~90% agree with it.

The authors argue that people are trying to build ASI (superintelligent AI), and we should expect them to succeed sooner or later, even if they obviously haven’t succeeded YET. I agree. (I lean “later” more than the authors, but that’s a minor disagreement.)

Ultra-fast minds that can do superhuman-quality thinking at 10,000 times the speed, that do not age and

... (read more)
Reply
testingthewaters7h30

I would start by saying that I mostly agree with you here. On this point specifically, however,

AI capabilities would rebrand as AI safety

I mean, 3 of the leading AI labs (DeepMind, OpenAI, Anthropic) were founded explicitly under or attached to the banner of AI safety. OpenAI and Anthropic were even founded as "the safer alternatives" to DeepMind and OpenAI! You also don't have to go back very far to find AI safety funders and community voices promoting those labs as places to work to advance AI safety (whereas today you'd be hard-pressed to find someo... (read more)

Reply1
Noah Birnbaum's Shortform
Noah Birnbaum12h90

“Albania has introduced its first artificial intelligence “minister”, who addressed parliament on Thursday in a debut speech.” lol, what???

Not sure how much this really matters vs is just a PR thing, but it’s maybe something people on here should know about. 

Reply
Guive8h30

I'm no expert on Albanian politics, but I think it's pretty obvious this is just a gimmick with minimal broader significance. 

Reply
Nina Panickssery's Shortform
Nina Panickssery9h40

Could HGH supplementation in children improve IQ?

I think there's some weak evidence that yes. In some studies where they give HGH for other reasons (a variety of developmental disorders, as well as cases when the child is unusually small or short), an IQ increase or other improved cognitive outcomes are observed. The fact that this occurs in a wide variety of situations indicates that it could be a general effect that could apply to healthy children.

Examples of studies (caveat: produced with the help of ChatGPT, I'm including null results also). Left colum... (read more)

Reply
Kabir Kumar9h10

has it been tested on adults a lot?

Reply
Shortform
Cleo Nardo17h*502

Prosaic AI Safety research, in pre-crunch time.

Some people share a cluster of ideas that I think is broadly correct. I want to write down these ideas explicitly so people can push-back. 

  1. The experiments we are running today are kinda 'bullshit'[1] because the thing we actually care about doesn't exist yet, i.e. ASL-4, or AI powerful enough that they could cause catastrophe if we were careless about deployment.
  2. The experiments in pre-crunch-time use pretty bad proxies.
  3. 90% of the "actual" work will occur in early-crunch-time, which is the duration be
... (read more)
Reply
Showing 3 of 8 replies (Click to show all)
Lucas Teixeira9h30

I'm curious if you have a sense of:
 

1. What the target goal of early-crunch time research should be (i.e. control safety case for the specific model one has at the present moment, trustworthy case for this specific model, trustworthy safety case for the specific model and deference case for future models, trustworthy safety case for all future models, etc...)

2. The rough shape(s) of that case (i.e. white-box evaluations, control guardrails, convergence guarantees, etc...)

3. What kinds of evidence you expect to accumulate given access to these early po... (read more)

Reply
2jacquesthibs10h
For those who haven't seen, coming from the same place as OP, I describe my thoughts in Automating AI Safety: What we can do today. Specifically in the side notes: Should we just wait for research systems/models to get better? [...] Moreover, once end-to-end automation is possible, it will still take time to integrate those capabilities into real projects, so we should be building the necessary infrastructure and experience now. As Ryan Greenblatt has said, “Further, it seems likely we’ll run into integration delays and difficulties speeding up security and safety work in particular[…]. Quite optimistically, we might have a year with 3× AIs and a year with 10× AIs and we might lose half the benefit due to integration delays, safety taxes, and difficulties accelerating safety work. This would yield 6 additional effective years[…].” Building automated AI safety R&D ecosystems early ensures we're ready when more capable systems arrive. Research automation timelines should inform research plans It’s worth reflecting on scheduling AI safety research based on when we expect sub-areas of safety research will be automatable. For example, it may be worth putting off R&D-heavy projects until we can get AI agents to automate our detailed plans for such projects. If you predict that it will take you 6 months to 1 year to do an R&D-heavy project, you might get more research mileage by writing a project proposal for this project and then focusing on other directions that are tractable now. Oftentimes it’s probably better to complete 10 small projects in 6 months and then one big project in an additional 2 months, rather than completing one big project in 7 months. This isn’t to say that R&D-heavy projects are not worth pursuing—big projects that are harder to automate may still be worth prioritizing if you expect them to substantially advance downstream projects (such as ControlArena from UK AISI). But research automation will rapidly transform what is ‘low-hanging fruit’.
4jacquesthibs11h
I do, which is why I've always placed much more emphasis on figuring out how to do automated AI safety research as safely as we can, rather than trying to come up with some techniques that seem useful at the current scale but will ultimately be a weak proxy (but are good for gaining reputation in and out of the community, cause it looks legit). That said, I think one of the best things we can hope for is that these techniques at least help us to safely get useful alignment research in the lead up to where it all breaks and that it allows us to figure out better techniques that do scale for the next generation while also having a good safety-usefulness tradeoff.
Peter Wildeford's Shortform
Peter Wildeford10h20

People here might appreciate my book review: If We Build AI Superintelligence, Do We All Die?

I'd be curious for more takes from smart LessWrong readers.

Reply
reallyeli's Shortform
reallyeli1mo3-3

LLMs are trained on a human-generated text corpus. Imagine an LLM agent deciding whether or not to be a communist. Seems likely (though not certain) it would be strongly influenced by the existing human literature on communism, i.e. all the text humans have produced about communism arguing its pros/cons and empirical consequences.

Now replace 'communism' with 'plans to take over.' Humans have also produced a literature on this topic. Shouldn't we expect that literature to strongly influence LLM-based decisions on whether to take over?

This is an argument I'm... (read more)

Reply
Guive10h10

See also: "the void", "Self-Fulfilling Misalignment Data Might Be Poisoning Our AI Models"

Reply
leogao's Shortform
leogao3d76

crazy how x dot com is literally more addictive than actual amphetamines 

Reply
Lucas Teixeira11h10

Is this vibes or was there some kind of study done?

Reply
Felix Moses's Shortform
Felix Moses4d*60

Community Universal Basic Income

Epistemic status: strolling in Venice on a hot September evening tipsy with my mom

One of the advantages of church is that it forces encourages all members of a society to be part of social groups. This has a wide range of positive results from mental health to disaster preparedness and reduced wealth inequality, increased volunteering and more.

The down side is the whole God part. There are many communities that form without the God aspect, but they tend to be less diverse, harder to get into, and more focused around specific... (read more)

Reply
Showing 3 of 9 replies (Click to show all)
Felix Moses11h10

I probably should have made it clear: this is not a replacement of capitalism. As the title suggest, this is an alternative to UBI. I think thinking of better ways to do UBI becomes more and more important as AI gets better and better. Already, this would be more efficient along economies of scale line than traditional UBI since it goes from a single person, to a community.

As for getting up to the cap: here's what I was thinking. Once you get to arround ~150 ish, so around Dunbar’s number, it's time to starting splitting, the extra is just to make it so it... (read more)

Reply
2Viliam2d
By the way, I suspect that we are reinventing some "anarcho-something-ism" here... Thinking about the people at the bottom is a difficult trade-off: it would be better if they didn't stay abandoned, but for everyone else it is better to stay far away from them. Traditional solution: ignore them, or kill them if they become annoying Religious solution: promise Heaven to those who volunteer to spend time with them Socialist solution: put them in a mental institution and don't talk about the topic anymore Woke solution: leave them on the streets, don't talk about the topic and attack those who do ...sorry for politics, but I tried to list all the solutions I am aware of.
1Felix Moses11h
I was thinking more of rift on UBI, not a fundamental reordering of society. Iceland has a system where you can declare you're religious instantiation and money goes to it there, people have used that to have more community funded even non religious groups. So this would be a combo of UBI and Sóknargjald (“congregation fee”). The main addition here is to make it fully non-religious and cap it at a size where you can actually know everyone else involved. As for the what to do with people at the bottom question: Houston has done a better job than most with it's housing first policies. The best programs to me are ones that aim to really prevent people from become homeless in the first place, once you're mind has been ruined by a couple years of living outside and drugs it's seem almost impossible to functionally reintegrate people into society. Even if homeless circles, clearly forming community's and pooling resources does work. For a good example of this I think of Camp Resolution in Sacramento. This, of course, does come with problems, so I understand why cops choose to break up these encampments, but I think it does cause a real damage to communities they destroy. If they could somehow get enough money (say through a program like the one I'm proposing) to at least do group aparments or something like that, I think it would go someway to allowing the more functional homeless people to reintegrate into society. I've worked a decent amount with the homeless, and there are a group who are certainly beyond help, but there at also a lot of people who I think still could be real productive members of society.
leogao's Shortform
leogao1d5717

a thing i've noticed rat/autistic people do (including myself): one very easy way to trick our own calibration sensors is to add a bunch of caveats or considerations that make it feel like we've modeled all the uncertainty (or at least, more than other people who haven't). so one thing i see a lot is that people are self-aware that they have limitations, but then over-update on how much this awareness makes them calibrated. one telltale hint that i'm doing this myself is if i catch myself saying something because i want to demo my rigor and prove that i've... (read more)

Reply221
Showing 3 of 5 replies (Click to show all)
Vladimir_Nesov12h40

This might be more about miscalibration in perceived relevance of technical exercises inspired by some question. A directly mostly irrelevant exercise that juggles details can be useful, worth doing and even sharing, but mostly for improving model-building intuition and developing good framings in the long term rather than for answering the question that inspired it, especially at a technical level.

So an obvious mistake would be to treat such an exercise as evidence that the person doing/sharing it considers it directly relevant for answering the question ... (read more)

Reply
6Viliam18h
a related thing that I will mention here so that I don't have to write a separate post about it: although updating on evidence is a good thing, it is bad to think "I have updated on evidence, therefore I am now more right than others". maybe you just had to update more than others because you started from an especially stupid prior, so the fact that you updated more than others doesn't mean that you are now closer to the truth. as a silly example, imagine a group of people believing that 2+2=4, and an unlucky guy who believes that 2+2=7. after being exposed to lots of evidence, the latter updates to believing that 2+2=5, because 7 is obviously too much. now it is tempting for the unlucky guy to conclude "I did a lot of thinking about math, and I have changed my mind as a result. those other guys, they haven't changed their minds at all, they are just stuck with their priors. they should update too, and then we can all arrive to the correct conclusion that 2+2=5".
1CstineSublime1d
  I don't follow. If I know I don't "handle" spicy food well, so I avoid eating it. Then I'm not acting as if I'm less susceptible to spicy food because I've acknowledged it. Or are you talking about the proverbial example of someone who drives after getting tipsy, but believes because they're more "careful" they're safe-enough? As for brainworms - I'm not familiar with that term but can guess it's some kind of faddish toxic behaviour (I'm struggling to think of a concrete example, perhaps the use of bromides and platitudes in conversation like "keep your chin up" in lieu of tailored comfort and discourse?) - but what might be an example of a rat-brainworm and an analogous normie brain worm?
eggsyntax's Shortform
eggsyntax13h62

Draft thought, posting for feedback:

Many people (eg e/acc) believe that although a single very strong future AI might result in bad outcomes, a multi-agent system with many strong AIs will turn out well for humanity. To others, including myself, this seems clearly false.

Why do people believe this? Here's my thought:

  • Conditional on existing for an extended period of time, complex multi-agent systems have reached some sort of equilibrium (in the general sense, not the thermodynamic sense; it may be a dynamic equilibrium like classic predator-prey dynamics).
  • Th
... (read more)
Reply
Seth Herd12h40

I'm looking forward to seeing your post, because I think this deserves more careful thought.

I think that's right, and that there are some more tricky assumptions and disanalogies underlying that basic error.

Before jumping in, let me say that I think that multipolar scenarious are pretty obviously more dangerous to a first approximation. There may be more carefully thought-out routes to equilibria that might work and are worth exploring. But just giving everyone an AGI and hoping it works out would probably be very bad.

Here's where I think the mistake usual... (read more)

Reply
Load More