Some thoughts on this post:
You need adaptability because on the timeframe that you might build a company or start a startup or start a charity, you can expect the rest of the world to remain fixed. But on the timeframe that you want to have a major political movement, on the timeframe that you want to reorient the U.S. government's approach to AI, a lot of stuff is coming at you. The whole world is, in some sense, weighing in on a lot of the interests that have historically been EA's interests.
I'll flag that for AI safety specifically, the world hasn't yet weighed in that much, and can be treated as mostly fixed for the purposes of analysis (with caveats), but yes AI safety in general does need to prepare for the real possibility that the world in general will weigh in a lot more on AI safety, and there are a non-trivial amount of worlds where AI safety becomes a lot more mainstream.
I don't think we should plan on this happening, but I definitely agree that the world may weigh in way more on AI safety than before, especially just before an AI explosion.
On environmentalism's fuckups:
So what does it look like to fuck up the third wave? The next couple of slides are deliberately a little provocative. You should take them 80% of how strongly I say them, and in general, maybe you should take a lot of the stuff I say 80% of how seriously I say it, because I'm very good at projecting confidence.
But I claim that one of the examples where operating at scale is just totally gone to shit is the environmentalist movement. I would somewhat controversially claim that by blocking nuclear power, environmentalism caused climate change. Via the Environmental Protection Act, environmentalism caused the biggest obstacle to clean energy deployment across America. Via opposition to geoengineering, it's one of the biggest obstacles to actually fixing climate change. The lack of growth of new housing in Western countries is one of the biggest problems that's holding back Western GDP growth and the types of innovation that you really want in order to protect the environment.
I can just keep going down here. I think the overpopulation movement really had dramatically bad consequences on a lot of the developing world. The blocking of golden rice itself was just an absolute catastrophe.
The point here is not to rag on environmentalism. The point is: here's a thing that sounds vaguely good and kind of fuzzy and everyone thinks it's pretty reasonable. There are all these intuitions that seem nice. And when you operate at scale and you're not being careful, you don't have the types of virtues or skills that I laid out in the last slide, you just really fuck a lot of stuff up. (I put recycling on there because I hate recycling. Honestly, it's more a symbol than anything else.)
I want to emphasize that there is a bunch of good stuff. I think environmentalism channeled a lot of money towards the development of solar. That was great. But if you look at the scale of how badly you can screw these things up when you're taking a mindset that is not adapted to operating at the scale of a global economy or global geopolitics, it's just staggering, really. I think a lot of these things here are just absolute moral catastrophes that we haven't really reckoned with.
Feel free to dispute this in office hours, for example, but take it seriously. Maybe I want to walk back these claims 20% or something, but I do want to point at the phenomenon.
I definitely don't think environmentalists caused climate change, and that's despite thinking that the nuclear restrictions were very dumb, mostly because oil companies already were causing climate change (albeit far more restrained at the time) when they pumped oil, and this is also true of gas and coal companies.
I do think there's a problem with environmentalism not accepting solutions that don't fit a nature aesthetic, but that's another problem that is mostly seperate fromcausing climate change.
Also, most of the solution here would have been to be more consequentialist and more willingness to accept expected utility maximization.
There's arguments to be said that expected utility maximization is overrated on LW, but it's severely underrated by basically everyone else, and basically everyone would be helped by adopting more of a utility maximization mindset.
And then I want to say: okay, what if that happens for us? I can kind of see that future. I can kind of picture a future where AI safety causes a bunch of harms that are analogous to this type of thing:
My general views on how AI safety could go wrong is that they go wrong though either becoming somewhat like climate change/environmentalist partisans, where they systematically overestimate the severity of plausible harms, even when the harms do exist, and the other side then tries to dismiss the harms entirely, causing a polarization cascade, and another worry is that they might not realize that the danger from AI misalignment has passed, so they desperately try to keep relevance.
I have a number of takes on the bounty questions, but I'll wait until you actually post them.
So do I have answers? Kind of. These are just tentative answers and I'll do a bit more object-level thinking later. This is the more meta element of the talk. There's one response: oh, boy, we better feel really guilty whenever we fuck up and apply a lot of pressure to make sure everyone's optimizing really hard for exactly the good things and exclude anyone who plausibly is gonna skew things in the wrong direction.
And I think this isn't quite right. Maybe that works at a startup. Maybe it works at a charity. I don't think it works in Wave 3, because of the arguments I gave before. Wave 3 needs virtue ethics. You just don't want people who are guilt-ridden and feeling a strong sense of duty and heroic responsibility to be in charge of very sensitive stuff. There's just a lot of ways that that can go badly. So I want to avoid that trap.
I basically agree with this, but would perhaps avoid virtue ethics, but yes one of the main things I'd generally like to see is more LWers treating stuff like saving the world with the attitude you'd have from being in a job, perhaps at a startup or government bodies like the Senate or House of Representatives in say America, rather than viewing it as your heroic responsibility.
In this respect, I think Eliezer was dangerously wrong to promote a norm of heroism/heroic responsibility.
Libertarians have always talked about “there's too much regulation”, but I think it's underrated how this is not a fact about history, this is a thing we are living through—that we are living through the explosion of bureaucracy eating the world. The world does not have defense mechanisms against these kinds of bureaucratic creep. Bureaucracy is optimized for minimizing how much you can blame any individual person. So there's never a point at which the bureaucracy is able to take a stand or do the sensible policy or push back even against stuff that's blatantly illegal, like a bunch of the DEI stuff at American universities. It’s just really hard to draw a line and be like “hey, we shouldn't do this illegal thing”. Within a bureaucracy, nobody does it. And you have multitudes of examples.
My controversial take here is that most of the responsibility can be divvied up to the voters first and foremost, and secondly to the broad inability to actually govern using normal legislative methods once in power.
Okay. That was all super high level. I think it's a good framework in general. But why is it directly relevant to people in this room? My story here is: the more powerful AI gets, the more everyone just becomes an AI safety person. We've kind of seen this already: AI has just been advancing and over time, you see people falling into the AI safety camp with metronomic predictability. It starts with the most prescient and farsighted people in the field. You have Hinton, you have Bengio and then Ilya and so on. And it's just ticking its way through the whole field of ML and then the whole U.S. government, and so on. By the time AIs have the intelligence of a median human and the agency of a median human, it's just really hard to not be an AI safety person. So then I think the problem that we're going to face is maybe half of the AI safety people are fucking it up and we don't know which half.
I think this is plausible, but not very likely to happen, and I do think it's still plausible we will be in a moment where AI safety doesn't become mainstream by default.
This especially is likely to occur if software singularities/FOOM/software intelligence explosion is at all plausible, and in these cases we cannot rely on our institutions automatically keeping up.
Link below:
I do think it's worthwhile for people to focus on worlds where AI safety does become a mainstream political topic, but that we shouldn't bank on AI safety going mainstream in our technical plans to make AI safe.
My takes on what we should do, in reply to you:
I also have a recent post on why history and philosophy of science is a really useful framework for thinking about these big picture questions and what would it look like to make progress on a lot of these very difficult issues, compared with being bayesian. I’m not a fan of bayesianism—to a weird extent it feels like a lot of the mistakes that the community has made have fallen out of bad epistemology. I'm biased towards thinking this because I'm a philosopher but it does seem like if you had a better decision theory and you weren't maximizing expected utility then you might not screw FTX up quite as badly, for instance.
Suffice it to say that I'm broadly unconvinced by your criticism of Bayesianism from a philosophical perspective, for roughly the reasons @johnswentworth identified below:
https://www.lesswrong.com/posts/TyusAoBMjYzGN3eZS/why-i-m-not-a-bayesian#AGxg2r4HQoupkdCWR
On mechanism design:
You can think of the US Constitution as trying to bridge that gap. You want to prevent anarchy, but you also want to prevent concentration of power. And so you have this series of checks and balances. One of the ways we should be thinking about AI governance is as: how do we put in regulations that also have very strong checks and balances? And the bigger of a deal you think AI is going to be, the more like a constitution—the more robust—these types of checks and balances need to be. It can't just be that there's an agency that gets to veto or not veto AI deployments. It needs to be much more principled and much closer to something that both sides can trust.
On the principled mechanism design thing, the way I want to frame this is: right now we think about governance of AI in terms of taking governance tools and applying them to AI. I think this is not principled enough. Instead the thing we want to be thinking about is governance with AI— what would it look like if you had a set of rigorous principles that governed the way in which AI was deployed throughout governments, that were able to provide checks and balances. Able to have safeguards but also able to make governments way more efficient and way better at leveraging the power of AI to, for example, provide a neutral independent opinion like when there's a political conflict.
One particular wrinkle to add here is that institutions/countries of the future are going to have to be value-aligned to their citizenry in a way that is genuinely unprecedented of basically any institution, because if they are not value aligned, then we just have the alignment problem again, where the people in power have very large incentives to just get rid of the rest, given arbitrary selfish values (and I don't buy the hypothesis that consumption/human wants are fundamentally limited).
The biggest story of the 21st century is how AI is making alignment way, way more necessary than in the past.
Some final points:
There’s something in the EA community around this that I’m a little worried about. I’ve got this post on how to have more cooperative AI safety strategies, which kind of gets into this. But I think a lot of it comes down to just having a rich conception of what it means to do good in this world. Can we not just “do good” in the sense of finding a target and running as hard as we can toward it, but instead think about ourselves as being on a team in some sense with the rest of humanity—who will be increasingly grappling with a lot of the issues I’ve laid out here?
What is the role of our community in helping the rest of humanity to grapple with this? I almost think of us as first responders. First responders are really important — but also, if they try to do the whole job themselves, they’re gonna totally mess it up. And I do feel the moral weight of a lot of the examples I laid out earlier—of what it looks like to really mess this up. There’s a lot of potential here. The ideas in this community—the ability to mobilize talent, the ability to get to the heart of things—it’s incredible. I love it. And I have this sense—not of obligation, exactly—but just…yeah, this is serious stuff. I think we can do it. I want us to take that seriously. I want us to make the future go much better. So I’m really excited about that. Thank you.
On the one hand, I partially agree that in general a willingness to make plans that depend on others cooperating was definitely lacking, and I definitely agree that some ability to cooperate is necessary.
On the other hand, I broadly do not buy the idea that we are on a team with the rest of humanity, and more importantly I do think we need to prepare for worlds in which uncooperative/fighty actions like restraining open-source/potentially centralizing AI development is necessary to ensure human survival, which means that EA should be prepared to win power struggles over AI if necessary to do so.
The one big regret I have in retrospect on AI governance is that they tried to ride the wave too early, before AI was salient to the general public, which meant polarization partially happen.
Veaulans is right here:
https://x.com/veaulans/status/1890245459861729432
In hindsight, the pause letter should have been released in spring 2026. Pausing might be necessary, but it won't happen without an overabundance of novelty/weirdness in the life of the guy in line with you at the DMV. When *that guy* is scared is when you have your chance
lc has argued that the measured tasks are unintentionally biased towards ones where long-term memory/context length doesn't matter:
https://www.lesswrong.com/posts/hhbibJGt2aQqKJLb7/shortform-1#vFq87Ge27gashgwy9
I like your explanation of why normal reliability engineering is not enough, but I'll flag that security against actors are probably easier than LW in general portrays, and I think computer security as a culture is prone to way overestimating the difficulty of security because of incentive issues, not remembering the times something didn't happen, and more generally side-channels arguably being much more limited than people think they do (precisely because they rely on very specific physical stuff, rather than attacking the algorithm).
It's a non-trivial portion of my optimism on surviving AGI coming in that security, while difficult is not unreasonably difficult, and partial successes matter from a security standpoint.
Link below:
I have 2 cruxes here:
In particular, I do not buy that humans and chimpanzees are nearly that similar as Heinrich describes, and a big reason for this is that the study that showed that had heavily optimized and selected the best chimpanzees against reasonably average humans, which is not a good way to compare performance if you want the results to generalize.
I don't think they're wildly different, and I'd usually put chimps effective flops as 1-2 OOMs lower, but I wouldn't go nearly as far as Heinrich on the similarities.
I do think culture actually matters, but nowhere near as much as Heinrich wants it to matter.
I agree evolution has probably optimized human learning, but I don't think that it's so heavily optimized that we can use it to give a tighter upper bound than 13 OOMs, and the reason for this is I do not believe that humans are in equilibrium, and this means that there are probably optimizations left to discover, so I do think the 13 OOMs number is plausible )with high uncertainty).
Comment below:
https://www.lesswrong.com/posts/DbT4awLGyBRFbWugh/#mmS5LcrNuX2hBbQQE
I'll flag that while I personally didn't believe in the idea that orcas are on average >6 SDs smarter than humans, and never considered it that plausible, I'd say that I don't think orcas could actually benefit that much from +6 SDs even if applied universally, and the reason is that they are in water, which severely limits your available technology options, and makes it really, really hard to form the societies needed to generate the explosion that happened post-industrial Revolution or even the agricultural revolution.
And there is a deep local optimum issue in which their body plan is about as unsuited to using tools as possible, and changing this requires technology they almost certainly can't invent because the things you would need to make the tech are impossible to get at the pressure and saltiniess of the water, so it is pretty much impossible for orcas to get that much better with large increases in intelligence.
Thus, orca societies have a pretty hard limit on what they can achieve, at least ruling out technologies they cannot invent.
My take is that the big algorithmic difference that explains a lot of weird LLM deficits, and plausibly explains the post's findings, is that current neural networks do not learn at run-time, instead their weights are frozen, and this explains a central difference of why humans are able to outperform LLMs at longer tasks, because humans have the ability to learn at run-time, as do a lot of other animals.
Unfortunately, this ability is generally lost gradually starting in your 20s, but still the existence of non-trivial learning at runtime is a huge explainer of why humans are more successful at longer tasks than AIs currently are.
And thus if OpenAI or Anthropic found this secret to life long learning, this would explain the hype (though I personally place very low probability that they succeeded on this for anything that isn't math or coding/software).
Gwern explains below:
Re other theories, I don't think that all other theories in existence have infinitely many adjustable parameters, and if he's referring to the fact that lots of theories have adjustable parameters that can range over the real numbers, which are infinitely complicated in general, than that's different, and string theory may have this issue as well.
Re string theory's issue of being vacuous, I think the core thing that string theory predicts that other quantum gravity models don't is that at the large scale, you recover general relativity and the standard model, whereas no other theory can yet figure out a way to properly include both the empirical effects of gravity and quantum mechanics in the parameter regimes where they are known to work, so string theory predicts more just by predicting the things other quantum mechanics predicts while having the ability to include in gravity without ruining the other predictions, whereas other models of quantum gravity tend to ruin empirical predictions like general relativity approximately holding pretty fast.
Link to long comments that I want to pin, but are too long to be pinned:
https://www.lesswrong.com/posts/Zzar6BWML555xSt6Z/?commentId=aDuYa3DL48TTLPsdJ
https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/?commentId=Gcigdmuje4EacwirD