I appreciate the straightforward and honest nature of this communication strategy, in the sense of "telling it like it is" and not hiding behind obscure or vague language. In that same spirit, I'll provide my brief, yet similarly straightforward reaction to this announcement:
Eliezer's response to claims about unfalsifiability, namely that "predicting endpoints is easier than predicting intermediate points", seems like a cop-out to me, since this would seem to reverse the usual pattern in forecasting and prediction, without good reason
It's pretty standard? Like, we can make reasonable prediction of climate in 2100, even if we can't predict weather two month ahead.
To be blunt, it's not just that Eliezer lacks a positive track record in predicting the nature of AI progress, which might be forgivable if we thought he had really good intuitions about this domain. Empiricism isn't everything, theoretical arguments are important too and shouldn't be dismissed. But-
Eliezer thought AGI would be developed from a recursively self-improving seed AI coded up by a small group, "brain in a box in a basement" style. He dismissed and mocked connectionist approaches to building AI. His writings repeatedly downplayed the importance of compute, and he has straw-manned writers like Moravec who did a better job at predicting when AGI would be developed than he did.
Old MIRI intuition pumps about why alignment should be difficult like the "Outcome Pump" and "Sorcerer's apprentice" are now forgotten, it was a surprise that it would be easy to create helpful genies like LLMs who basically just do what we want. Remaining arguments for the difficulty of alignment are esoteric considerations about inductive biases, counting arguments, etc. So yes, let's actually look at these arguments and not just dismiss them, but let's not pretend that MIRI has a good track record.
I think the core concerns remain, and more importantly, there are other rather doom-y scenarios possible involving AI systems more similar to the ones we have that opened up and aren't the straight up singleton ASI foom. The problem here is IMO not "this specific doom scenario will become a thing" but "we don't have anything resembling a GOOD vision of the future with this tech that we are nevertheless developing at breakneck pace". Yet the amount of dystopian or apocalyptic possible scenarios is enormous. Part of this is "what if we lose control of the AIs" (singleton or multipolar), part of it is "what if we fail to structure our society around having AIs" (loss of control, mass wireheading, and a lot of other scenarios I'm not sure how to name). The only positive vision the "optimists" on this have to offer is "don't worry, it'll be fine, this clearly revolutionary and never seen before technology that puts in question our very role in the world will play out the same way every invention ever did". And that's not terribly convincing.
True knowledge about later times doesn't let you generally make arbitrary predictions about intermediate times, given valid knowledge of later times. But true knowledge does usually imply that you can make some theory-specific predictions about intermediate times, given later times.
Thus, vis-a-vis your examples: Predictions about the climate in 2100 don't involve predicting tomorrow's weather. But they do almost always involve predictions about the climate in 2040 and 2070, and they'd be really sus if they didn't.
Similarly:
So I think that -- entirely apart from specific claims about whether MIRI does this -- it's pretty reasonable to expect them to be able to make some theory-specific predictions about the before-end-times, although it's unreasonable to expect them to make arbitrary theory-specific predictions.
I agree this is usually the case, but I think it’s not always true, and I don’t think it’s necessarily true here. E.g., people as early as Da Vinci guessed that we’d be able to fly long before we had planes (or even any flying apparatus which worked). Because birds can fly, and so we should be able to as well (at least, this was Da Vinci and the Wright brothers' reasoning). That end point was not dependent on details (early flying designs had wings like a bird, a design which we did not keep :p), but was closer to a laws of physics claim (if birds can do it there isn’t anything fundamentally holding us back from doing it either).
Superintelligence holds a similar place in my mind: intelligence is physically possible, because we exhibit it, and it seems quite arbitrary to assume that we’ve maxed it out. But also, intelligence is obviously powerful, and reality is obviously more manipulable than we currently have the means to manipulate it. E.g., we know that we should be capable of developing advanced nanotech, since cells can, and that space travel/terraforming/etc. is possible.
These two things together—“we can likely create something much smarter than ourselves” and...
There's a pretty big difference between statements like "superintelligence is physically possible", "superintelligence could be dangerous" and statements like "doom is >80% likely in the 21st century unless we globally pause". I agree with (and am not objecting to) the former claims, but I don't agree with the latter claim.
I also agree that it's sometimes true that endpoints are easier to predict than intermediate points. I haven't seen Eliezer give a reasonable defense of this thesis as it applies to his doom model. If all he means here is that superintelligence is possible, it will one day be developed, and we should be cautious when developing it, then I don't disagree. But I think he's saying a lot more than that.
I think it's more similar to saying that the climate in 2040 is less predictable than the climate in 2100, or saying that the weather 3 days from now is less predictable than the weather 10 days from now, which are both not true. By contrast, the weather vs. climate distinction is more of a difference between predicting point estimates vs. predicting averages.
the climate in 2040 is less predictable than the climate in 2100
It's certainly not a simple question. Say, Gulf Stream is projected to collapse somewhere between now and 2095, with median date 2050. So, slightly abusing meaning of confidence intervals, we can say that in 2100 we won't have Gulf Stream with probability >95%, while in 2040 Gulf Stream will still be here with probability ~60%, which is literally less predictable.
Chemists would give an example of chemical reactions, where final thermodynamically stable states are easy to predict, while unstable intermediate states are very hard to even observe.
Very dumb example: if you are observing radioactive atom with half-life of one minute, you can't predict when atom is going to decay, but you can be very certain that it will decay after hour.
And why don't you accept classic MIRI example that even if it's impossible for human to predict moves of Stockfish 16, you can be certain that Stockfish will win?
Chemists would give an example of chemical reactions, where final thermodynamically stable states are easy to predict, while unstable intermediate states are very hard to even observe.
I agree there are examples where predicting the end state is easier to predict than the intermediate states. Here, it's because we have strong empirical and theoretical reasons to think that chemicals will settle into some equilibrium after a reaction. With AGI, I have yet to see a compelling argument for why we should expect a specific easy-to-predict equilibrium state after it's developed, which somehow depends very little on how the technology is developed.
It's also important to note that, even if we know that there will be an equilibrium state after AGI, more evidence is generally needed to establish that the end equilibrium state will specifically be one in which all humans die.
And why don't you accept classic MIRI example that even if it's impossible for human to predict moves of Stockfish 16, you can be certain that Stockfish will win?
I don't accept this argument as a good reason to think doom is highly predictable partly because I think the argument is dramatically underspecified without...
I think you are abusing/misusing the concept of falsifiability here. Ditto for empiricism. You aren't the only one to do this, I've seen it happen a lot over the years and it's very frustrating. I unfortunately am busy right now but would love to give a fuller response someday, especially if you are genuinely interested to hear what I have to say (which I doubt, given your attitude towards MIRI).
I unfortunately am busy right now but would love to give a fuller response someday, especially if you are genuinely interested to hear what I have to say (which I doubt, given your attitude towards MIRI).
I'm a bit surprised you suspect I wouldn't be interested in hearing what you have to say?
I think the amount of time I've spent engaging with MIRI perspectives over the years provides strong evidence that I'm interested in hearing opposing perspectives on this issue. I'd guess I've engaged with MIRI perspectives vastly more than almost everyone on Earth who explicitly disagrees with them as strongly as I do (although obviously some people like Paul Christiano and other AI safety researchers have engaged with them even more than me).
(I might not reply to you, but that's definitely not because I wouldn't be interested in what you have to say. I read virtually every comment-reply to me carefully, even if I don't end up replying.)
Here's a new approach: Your list of points 1 - 7. Would you also make those claims about me? (i.e. replace references to MIRI with references to Daniel Kokotajlo.)
You've made detailed predictions about what you expect in the next several years, on numerous occasions, and made several good-faith attempts to elucidate your models of AI concretely. There are many ways we disagree, and many ways I could characterize your views, but "unfalsifiable" is not a label I would tend to use for your opinions on AI. I do not mentally lump you together with MIRI in any strong sense.
OK, glad to hear. And thank you. :) Well, you'll be interested to know that I think of my views on AGI as being similar to MIRI's, just less extreme in various dimensions. For example I don't think literally killing everyone is the most likely outcome, but I think it's a very plausible outcome. I also don't expect the 'sharp left turn' to be particularly sharp, such that I don't think it's a particularly useful concept. I also think I've learned a lot from engaging with MIRI and while I have plenty of criticisms of them (e.g. I think some of them are arrogant and perhaps even dogmatic) I think they have been more epistemically virtuous than the average participant in the AGI risk conversation, even the average 'serious' or 'elite' participant.
I want to publicly endorse and express appreciation for Matthew's apparent good faith.
Every time I've ever seen him disagreeing about AI stuff on the internet (a clear majority of the times I've encountered anything he's written), he's always been polite, reasonable, thoughtful, and extremely patient. Obviously conversations sometimes entail people talking past each other, but I've seen him carefully try to avoid miscommunication, and (to my ability to judge) strawmanning.
Thank you Mathew. Keep it up. : )
Followup: Matthew and I ended up talking about it in person. tl;dr of my position is that
Falsifiability is a symmetric two-place relation; one cannot say "X is unfalsifiable," except as shorthand for saying "X and Y make the same predictions," and thus Y is equally unfalsifiable. When someone is going around saying "X is unfalsifiable, therefore not-X," that's often a misuse of the concept--what they should say instead is "On priors / for other reasons (e.g. deference) I prefer not-X to X; and since both theories make the same predictions, I expect to continue thinking this instead of updating, since there won't be anything to update on.
What is the point of falsifiability-talk then? Well, first of all, it's quite important to track when two theories make the same predictions, or the same-predictions-till-time-T. It's an important part of the bigger project of extracting predictions from theories so they can be tested. It's exciting progress when you discover that two theories make different predictions, and nail it down well enough to bet on. Secondly, it's quite important to track when people are making this worse rather than easier -- e.g. fortunetellers and pundits will of...
"If your model of reality has the power to make these sweeping claims with high confidence, then you should almost certainly be able to use your model of reality to make novel predictions about the state of the world prior to AI doom that would help others determine if your model is correct."
This is partially derivable from Bayes rule. In order for you to gain confidence in a theory, you need to make observations which are more likely in worlds where the theory is correct. Since MIRI seems to have grown even more confident in their models, they must've observed something which is more likely to be correct under their models. Therefore, to obey Conservation of Expected Evidence, the world could have come out a different way which would have decreased their confidence. So it was falsifiable this whole time. However, in my experience, MIRI-sympathetic folk deny this for some reason.
It's simply not possible, as a matter of Bayesian reasoning, to lawfully update (today) based on empirical evidence (like LLMs succeeding) in order to change your probability of a hypothesis that "doesn't make" any empirical predictions (today).
...The fact that MIRI has yet to produce (to my knowledge) a
In summer 2022, Quintin Pope was explaining the results of the ROME paper to Eliezer. Eliezer impatiently interrupted him and said "so they found that facts were stored in the attention layers, so what?". Of course, this was exactly wrong --- Bau et al. found the circuits in mid-network MLPs. Yet, there was no visible moment of "oops" for Eliezer.
I think I am missing context here. Why is that distinction between facts localized in attention layers and in MLP layers so earth-shaking Eliezer should have been shocked and awed by a quick guess during conversation being wrong, and is so revealing an anecdote you feel that it is the capstone of your comment, crystallizing everything wrong about Eliezer into a story?
^ Aggressive strawman which ignores the main point of my comment. I didn't say "earth-shaking" or "crystallizing everything wrong about Eliezer" or that the situation merited "shock and awe." Additionally, the anecdote was unrelated to the other section of my comment, so I didn't "feel" it was a "capstone."
I would have hoped, with all of the attention on this exchange, that someone would reply "hey, TurnTrout didn't actually say that stuff." You know, local validity and all that. I'm really not going to miss this site.
Anyways, gwern, it's pretty simple. The community edifies this guy and promotes his writing as a way to get better at careful reasoning. However, my actual experience is that Eliezer goes around doing things like e.g. impatiently interrupting people and being instantly wrong about it (importantly, in the realm of AI, as was the original context). This makes me think that Eliezer isn't deploying careful reasoning to begin with.
^ Aggressive strawman which ignores the main point of my comment. I didn't say "earth-shaking" or "crystallizing everything wrong about Eliezer" or that the situation merited "shock and awe."
I, uh, didn't say you "say" either of those: I was sarcastically describing your comment about an anecdote that scarcely even seemed to illustrate what it was supposed to, much less was so important as to be worth recounting years later as a high profile story (surely you can come up with something better than that after all this time?), and did not put my description in quotes meant to imply literal quotation, like you just did right there. If we're going to talk about strawmen...
someone would reply "hey, TurnTrout didn't actually say that stuff."
No one would say that or correct me for falsifying quotes, because I didn't say you said that stuff. They might (and some do) disagree with my sarcastic description, but they certainly weren't going to say 'gwern, TurnTrout never actually used the phrase "shocked and awed" or the word "crystallizing", how could you just make stuff up like that???' ...Because I didn't. So it seems unfair to judge LW and talk about how you are "not going to miss ...
Disagree. Epistemics is a group project and impatiently interrupting people can make both you and your interlocutor less likely to combine your information into correct conclusions. It is also evidence that you're incurious internally which makes you worse at reasoning, though I don't want to speculate on Eliezer's internal experience in particular.
One day a mathematician doesn’t know a thing. The next day they do. In between they made no observations with their senses of the world.
It’s possible to make progress through theoretical reasoning. It’s not my preferred approach to the problem (I work on a heavily empirical team at a heavily empirical lab) but it’s not an invalid approach.
I agree, and I was thinking explicitly of that when I wrote "empirical" evidence and predictions in my original comment.
I personally have updated a fair amount over time on
We can back out predictions of my personal models from this, such as "we will continue to not have a clear theory of alignment" or "there will continue to be consensus views that aren't supported by reasoning that's solid enough that it ought to produce that consensus if everyone is being reasonable".
I thought the first paragraph and the boldened bit of your comment seemed insightful. I don't see why what you're saying is wrong – it seems right to me (but I'm not sure).
I basically agree with your overall comment, but I'd like to push back in one spot:
If your model of reality has the power to make these sweeping claims with high confidence
From my understanding, for at least Nate Soares, he claims his internal case for >80% doom is disjunctive and doesn't route all through 1, 2, 3, and 4.
I don't really know exactly what the disjuncts are, so this doesn't really help and I overall agree that MIRI does make "sweeping claims with high confidence".
I think your summary is a good enough quick summary of my beliefs. The minutia that I object to is how confident and specific lots of parts of your summary are. I think many of the claims in the summary can be adjusted or completely changed and still lead to bad outcomes. But it's hard to add lots of uncertainty and options to a quick summary, especially one you disagree with, so that's fair enough.
(As a side note, that paper you linked isn't intended to represent anyone else's views, other than myself and Peter, and we are relatively inexperienced. I'm also no longer working at MIRI).
I'm confused about why your <20% isn't sufficient for you to want to shut down AI research. Is it because of benefits outweigh the risk, or because we'll gain evidence about potential danger and can shut down later if necessary?
I'm also confused about why being able to generate practical insights about the nature of AI or AI progress is something that you think should necessarily follow from a model that predicts doom. I believe something close enough to (1) from your summary, but I don't have much idea (above general knowledge) of how the first company to build such an agent will do so, or when they will work out how to do it. One doesn't imply the other.
I'm confused about why your <20% isn't sufficient for you to want to shut down AI research. Is it because of benefits outweigh the risk, or because we'll gain evidence about potential danger and can shut down later if necessary?
I think the expected benefits outweigh the risks, given that I care about the existing generation of humans (to a large, though not overwhelming degree). The expected benefits here likely include (in my opinion) a large reduction in global mortality, a very large increase in the quality of life, a huge expansion in material well-being, and more generally a larger and more vibrant world earlier in time. Without AGI, I think most existing people would probably die and get replaced by the next generation of humans, in a relatively much poor world (compared to the alternative).
I also think the absolute level risk from AI barely decreases if we globally pause. My best guess is that pausing would mainly just delay adoption without significantly impacting safety. Under my model of AI, the primary risks are long-term, and will happen substantially after humans have already gradually "handed control" over to the AIs and retired their labor on a large scale. ...
Would most existing people accept a gamble with 20% of chance of death in the next 5 years and 80% of life extension and radically better technology? I concede that many would, but I think it's far from universal, and I wouldn't be too surprised if half of people or more think this isn't for them.
I personally wouldn't want to take that gamble (strangely enough I've been quite happy lately and my life has been feeling meaningful, so the idea of dying in the next 5 years sucks).
(Also, I want to flag that I strongly disagree with your optimism.)
A thing I am confused about: what is the medium-to-long-term actual policy outcome you're aiming for? And what is the hopeful outcome which that policy unlocks?
You say "implement international AI compute governance frameworks and controls sufficient for halting the development of any dangerous AI development activity, and streamlined functional processes for doing so". The picture that brings to my mind is something like:
A prototypical "AI pause" policy in this vein would be something like "no new training runs larger than the previous largest run".
Now, the obvious-to-me shortcoming of that approach is that algorithmic improvement is moving at least as fast as scaling, a fact which I doubt Eliezer or Nate have overlooked. Insofar as that algorithmic improvement is itself compute-dependent, it's mostly dependent on small test runs rather than big training runs, so a pause-style policy would slow down the algorithmic component of AI progress basically not-at-all. So whatever your timelines look like, even a full pause on training runs larger than the current reco...
I don't speak for Nate or Eliezer in this reply; where I speak about Eliezer I am of course describing my model of him, which may be flawed.
Three somewhat disjoint answers:
These next changes implemented in the US, Europe and East Asia would probably buy us many decades:
Close all the AI labs and return their assets to their shareholders;
Require all "experts" (e.g., researchers, instructors) in AI to leave their jobs; give them money to compensate them for their temporary loss of earnings power;
Make it illegal to communicate technical knowledge about machine learning or AI; this includes publishing papers, engaging in informal conversations, tutoring, talking about it in a classroom; even distributing already-published titles on the subject gets banned.
Of course it is impractical to completely stop these activities (especially the distribution of already-published titles), but we do not have to completely stop them; we need only sufficiently reduce the rate at which the AI community worldwide produces algorithmic improvements. Here we are helped by the fact that figuring out how to create an AI capable of killing us all is probably still a very hard research problem.
What is most dangerous about the current situation is the tens of thousands of researchers world-wide with tens of billions in funding who feel perfectly free to communicate and collaborate...
I know how awful this sounds to many of the people reading this, including the person I am replying to...
I actually find this kind of thinking quite useful. I mean, the particular policies proposed are probably pareto-suboptimal, but there's a sound method in which we first ask "what policies would buy a lot more time?", allowing for pretty bad policies as a first pass, and then think through how to achieve the same subgoals in more palatable ways.
We understand that we may be discounted or uninvited in the short term, but meanwhile our reputation as straight shooters with a clear and uncomplicated agenda remains intact.
I don't have any substantive comments, but I do want to express a great deal of joy about this approach.
I am really happy to see people choosing to engage with the policy, communications, and technical governance space with this attitude.
You want to shut down AI to give more time... for what? Let's call the process you want to give more time to X. You want X to go faster than AI. It seems the relevant quantity is the ratio between the speed of X and the speed of AI. If X could be clarified, it would make it more clear how efficient it is to increase this ratio by speeding up X versus by slowing down AI. I don't see in this post any idea of what X is, or any feasibility estimate of how easy it is to speed up X versus slowing down AI.
One thing we can hope for, if we get a little more time rather than a lot more time, is that we might get various forms of human cognitive enhancement working, and these smarter humans can make more rapid progress on AI alignment.
Glad there is a specific idea there. What are the main approaches for this? There's Neuralink and there's gene editing, among other things. It seems MIRI may have access to technical talent that could speed up some of these projects.
Thank you for this update—I appreciate the clear reasoning. I also personally feel that the AI policy community is overinvested in the "say things that will get you points" strategy and underinvested in the "say true things that help people actually understand the problem" strategy. Specifically, I feel like many US policymakers have heard "be scared of AI because of bioweapons" but have not heard clear arguments about risks from autonomous systems, misalignment, AI takeover, etc.
A few questions:
What are the artifacts you're most excited about, and what's your rough prediction about when they will be ready?
Due to bugs in human psychology, we are more likely to succeed in our big projects if we don't yet state publicly what we're going to do by when. Sorry. I did provide some hints in the main post (website, book, online reference).
how do you plan to assess the success/failure of your projects? Are there any concrete metrics you're hoping to achieve? What does a "really good outcome" for MIRI's comms team look like by the end of the year,
The only concrete metric that really matters is "do we survive" but you are probably interested some intermediate performance indicators. :-P
The main things I am looking for within 2024 are not as SMART-goal shaped as you are probably asking for. What I'd like to see is that that we've developed enough trust in our most recent new hires that they are freely able to write on behalf of MIRI without getting important things wrong, such that we're no longer bottlenecked on a few key people within MIRI; that we're producing high-quality content at a much faster clip; that we have the capacity to handle many more of the press inquiries we r...
thank you for continuing to stretch the overton window! note that, luckily, the “off-switch” is now inside the window (though just barely so, and i hear that big tech is actively - and very myopically - lobbying against on-chip governance). i just got back from a UN AIAB meeting and our interim report does include the sentence “Develop and collectively maintain an emergency response capacity, off-switches and other stabilization measures” (while rest of the report assumes that AI will not be a big deal any time soon).
Have you considered emphasizing this part of your position:
"We want to shut down AGI research including governments, military, and spies in all countries".
I think this is an important point that is missed in current regulation, which focuses on slowing down only the private sector. It's hard to achieve because policymakers often favor their own institutions, but it's absolutely needed, so it needs to be said early and often. This will actually win you points with the many people who are cynical of the institutions, who are not just libertarians, but a growing portion of the public.
I don't think anyone is saying this, but it fits your honest and confronting communication strategy.
IDK if there's political support that would be helpful and that could be affected by people saying things to their representatives. But if so, then it would be helpful to have a short, clear, on-point letter that people can adapt to send to their representatives. Things I'd want to see in such a letter:
Or something.
Here are some event ideas/goals that could support the strategy:
Note these are general ideas, not informed by the specifics of MIRI's capabilities and interests.
(Our organization, Horizon Events, and myself personally are interested in helping MIRI with event goals - feel free to reach out via email o@horizonomega.org.)
Why does MIRI believe that an "AI Pause" would contribute anything of substance to the goal of protecting the human race? It seems to me that an AI pause would:
In any case, I think you are going to have an extremely difficult time in your messaging. I think this strategy will not succeed and will most likely, like most other AI safety efforts, actively harm your efforts.
Every movement thinks the...
There's a dramatic difference between this message and the standard fanatic message: a big chunk of it is both true, and intuitively so.
The idea that genuine smarter-than-humans-in-every-way AGI is dangerous is quite intuitive. How many people would say that, if we were visited by a more capable alien species, that would be totally safe for us?
The reason people don't intuitively see AI as dangerous is that they imagine it won't become fully agentic and genuinely outclass humans in all relevant ways. Convincing them otherwise is a complex argument, but continued progress will make that argument for us (unless it's all underground, which is a real risk as you say).
Now, that's not the part of their message that MIRI tends to emphasize. I think they had better, and I think they probably will.
That message actually benefits from not mixing it with any of the complex risks from sub-sapient tool AI that you mention. Doing what you suggest and using existing fears has dramatic downsides (although it still might be wise on a careful analysis - I haven't seen one that's convincing).
I agree with you that technical alignment of LLM-based AGI is quite achievable. I think we have plans for it tha...
We think audiences are numb to politics as usual. They know when they’re being manipulated. We have opted out of the political theater, the kayfabe, with all its posing and posturing. We are direct and blunt and honest, and we come across as exactly what we are.
This is IMO a great point, and true in general. I think "the meta" is sort of shifting and it's the guys who try too hard to come off as diplomatic who are often behind the curve. This has good and bad sides (sometimes it means that political extremism wins out over common sense simply because it's screechy and transgressive), but overall I think you got the pulse right on it.
What leads MIRI to believe that this policy of being very outspoken will work better than the expert-recommended policy of being careful what you say?
(Not saying it won't work, but this post doesn't seem to say why you think it will).
building misaligned smarter-than-human systems will kill everyone, including their children [...] if they come to understand this central truth.
I'd like to once again reiterate that the arguments for misaligned AIs killing literally all humans (if they succeed in takeover) are quite weak and probably literally all humans dying conditional on AI takeover is unlikely (<50% likely).
(To be clear, I think there is a substantial chance of at least 1 billion people dying and that AI takeover is very bad from a longtermist perspective.)
This is due to:
This is discussed in more detail here and here. (There is also some discussion here.)
(This content is copied from here and there is some discussion there.)
Further, as far as I can tell, central thought leaders of MIRI (Eliezer, Nate Soares) don't actually believe that misaligned AI takeover will lead to the deaths of literally all humans:
...I sometimes mention the possibility of being stored and sold to aliens a billion years later, which seems to me to validly incorporate mos
The more complex messges sounds like a great way to make the public communication more complex and offputting.
The difference between killing everyone and killing almost everyone while keeping a few alive for arcane purposes does not matter to most people, nor should it.
I agree that the arguments for misaligned AGI killing absolutely everyone aren't solid, but the arguments against that seem at least as shaky. So rounding it to "might quite possibly kill everyone" seems fair and succinct.
From the other thread where this comment originated: the argument that AGI won't kill everyone because people wouldn't kill everyone seems very bad, even when applied to human-imitating LLM-based AGI. People are nice because evolution meticulously made us nice. And even humans have killed an awful lot of people, with no sign they'd stop before killing everyone if it seemed useful for their goals.
Why not "AIs might violently takeover the world"?
Seems accurate to the concern while also avoiding any issues here.
That phrase sounds like the Terminator movies to me; it sounds like plucky humans could still band together to overthrow their robot overlords. I want to convey a total loss of control.
In documents where we have more room to unpack concepts I can imagine getting into some of the more exotic scenarios like aliens buying brain scans, but mostly I don't expect our audiences to find that scenario reassuring in any way, and going into any detail about it doesn't feel like a useful way to spend weirdness points.
Some of the other things you suggest, like future systems keeping humans physically alive, do not seem plausible to me. Whatever they're trying to do, there's almost certainly a better way to do it than by keeping Matrix-like human body farms running.
going into any detail about it doesn't feel like a useful way to spend weirdness points.
That may be a reasonable consequentialist decision given your goals, but it's in tension with your claim in the post to be disregarding the advice of people telling you to "hoard status and credibility points, and [not] spend any on being weird."
Whatever they're trying to do, there's almost certainly a better way to do it than by keeping Matrix-like human body farms running.
You've completely ignored the arguments from Paul Christiano that Ryan linked to at the top of the thread. (In case you missed it: 1 2.)
The claim under consideration is not that "keeping Matrix-like human body farms running" arises as an instrumental subgoal of "[w]hatever [AIs are] trying to do." (If you didn't have time to read the linked arguments, you could have just said that instead of inventing an obvious strawman.)
Rather, the claim is that it's plausible that the AI we build (or some agency that has decision-theoretic bargaining power with it) cares about humans enough to spend some tiny fraction of the cosmic endowment on our welfare. (Compare to how humans care enough about nature preservation and animal welf...
I think it makes sense to state the more direct threat-model of literal extinction; though I am also a little confused by the citing of weirdness points… I would’ve said that it makes the whole conversation more complex in a way that (I believe) everyone would reliably end up thinking was not a productive use of time.
(Expanding on this a little: I think that literal extinction is a likely default outcome, and most people who are newly coming to this topic would want to know that this is even in the hypothesis-space and find that to be key information. I think if I said “also maybe they later simulate us in weird configurations like pets for a day every billion years while experiencing insane things” they would not respond “ah, never mind then, this subject is no longer a very big issue”, they would be more like “I would’ve preferred that you had factored this element out of our discussion so far, we spent a lot of time on it yet it still seems to me like the extinction event being on the table is the primary thing that I want to debate”.)
Passing the onion test is better than not passing it, but I think the relevant standard is having intent to inform. There's a difference between trying to share relevant information in the hopes that the audience will integrate it with their own knowledge and use it to make better decisions, and selectively sharing information in the hopes of persuading the audience to make the decision you want them to make.
An evidence-filtering clever arguer can pass the onion test (by not omitting information that the audience would be surprised to learn was omitted) and pass the test of not technically lying (by not making false statements) while failing to make a rational argument in which the stated reasons are the real reasons.
Some of the other things you suggest, like future systems keeping humans physically alive, do not seem plausible to me.
I agree with Gretta here, and I think this is a crux. If MIRI folks thought it were likely that AI will leave a few humans biologically alive (as opposed to information-theoretically revivable), I don't think we'd be comfortable saying "AI is going to kill everyone". (I encourage other MIRI folks to chime in if they disagree with me about the counterfactual.)
I also personally have maybe half my probability mass on "the AI just doesn't store any human brain-states long-term", and I have less than 1% probability on "conditional on the AI storing human brain-states for future trade, the AI does in fact encounter aliens that want to trade and this trade results in a flourishing human civilization".
Here's another way to frame why this matters.
When you make a claim like "misaligned AIs kill literally everyone", then reasonable people will be like "but will they?" and you should be a in a position where you can defend this claim. But actually, MIRI doesn't really want to defend this claim against the best objections (or at least they haven't seriously done so yet AFAICT).
Further, the more MIRI does this sort of move, the more that reasonable potential allies will have to distance themselves.
When you make a claim like "misaligned AIs kill literally everyone", then reasonable people will be like "but will they?" and you should be a in a position where you can defend this claim.
I think most reasonable people will round off "some humans may be kept as brain scans that may have arbitrary cruelties done to them" to be equivalent to "everyone will be killed (or worse)" and not care about this particular point, seeing it as nitpicking that would not make the scenario any less horrible even if it was true.
I am not that confident about this. Or like, I don't know, I do notice my psychological relationship to "all the stars explode" and "earth explodes" is very different, and I am not good enough at morality to be confident about dismissing that difference.
Further, as far as I can tell, central thought leaders of MIRI (Eliezer, Nate Soares) don't actually believe that misaligned AI takeover will lead to the deaths of literally all humans:
This is confusing to me; those quotes are compatible with Eliezer and Nate believing that it's very likely that misaligned AI takeover leads to the deaths of literally all humans.
Perhaps you're making some point about how if they think it's at all plausible that it doesn't lead to everyone dying, they shouldn't say "building misaligned smarter-than-human systems will kill everyone". But that doesn't seem quite right to me: if someone believed event X will happen with 99.99% probability and they wanted to be succinct, I don't think it's very unreasonable to say "X will happen" instead of "X is very likely to happen" (as long as when it comes up at all, they're honest with their estimates).
they don't seem to have made a case for the AI killing literally everyone which addresses the decision theory counterargument effectively.
I think that's the crux here. I don't think the decision theory counterargument alone would move me from 99% to 75% - there are quite a few other reasons my probability is lower than that, but not purely on the merits of the argument in focus here. I would be surprised if that weren't the case for many others as well, and very surprised if they didn't put >75% probably on AI killing literally everyone.
I guess my position comes down to: There are many places where I and presumably you disagree with Nate and Eliezer's view and think their credences are quite different from ours, and I'm confused by the framing of this particular one as something like "this seems like a piece missing from your comms strategy". Unless you have better reasons than I for thinking they don't put >75% probability on this - which is definitely plausible and may have happened in IRL conversations I wasn't a part of, in which case I'm wrong.
I'm confused by the framing of this particular one as something like "this seems like a piece missing from your comms strategy". Unless you have better reasons than I for thinking they don't put >75% probability on this - which is definitely plausible and may have happened in IRL conversations I wasn't a part of, in which case I'm wrong.
Based partially on my in person interactions with Nate and partially on some amalgamated sense from Nate and Eliezer's comments on the topic, I don't think they seem very commited to the view "the AI will kill literally everyone".
Beyond this, I think Nate's posts on the topic (here, here, and here) don't seriously engage with the core arguments (listed in my comment) while simultaneously making a bunch of unimportant arguments that totally bury the lede.[1] See also my review of one of these posts here and Paul's comment here making basically the same point.
I think it seems unfortunate to:
Two things:
FWIW I still stand behind the arguments that I made in that old thread with Paul. I do think the game-theoretical considerations for AI maybe allowing some humans to survive are stronger, but they also feel loopy and like they depend on how good of a job we do on alignment, so I usually like to bracket them in conversations like this (though I agree it's relevant for the prediction of whether AI will kill literally everyone).
One of the main bottlenecks on explaining the full gravity of the AI situation to people is that they're already worn out from hearing about climate change, which for decades has been widely depicted as an existential risk with the full persuasive force of the environmentalism movement.
Fixing this rather awful choke point could plausibly be one of the most impactful things here. The "Global Risk Prioritization" concept is probably helpful for that but I don't know how accessible it is. Heninger's series analyzing the environmentalist movement was fantastic, but the fact that it came out recently instead of ten years ago tells me that the "climate fatigue" problem might be understudied, and evaluation of climate fatigue's difficulty/hopelessness might yield unexpectedly hopeful results.
Does MIRI have a statement on recent OpenAI events? I'm pretty excited about frank reflections on current events as helping people to orient.
Rob Bensinger has tweeted about it some.
Overall we continue to be pretty weak in on the "wave" side, having people comment publicly on current events / take part in discourse, and the people we hired recently are less interested in that and more interested in producing the durable content. We'll need to work on it.
stable, durable, proactive content – called “rock” content
FWIW this is conventionally called evergreen content.
Because it's relevant to my professional interest -- who do you think is really, really world class today on making "rock" and "wave" content ?
We are not investing in grass-roots advocacy, protests, demonstrations, and so on.
I like this, I'd be really interested to ask you, given that you're taking a first principles no bullshit approach to outreach, what do you think of protest in general?
Every protest I've witnessed seemed to be designed to annoy and alienate its witnesses, making it as clear as possible that there was no way to talk to these people, that their minds were on rails. I think most people recognize that as cult shit and are alienated by that.
A leftist friend once argued that protes...
I am not an expert, however I'd like to make a suggestion regarding the strategy. The issue I see with this approach is that policymakers have a very bad track record of listening to actual technical people (see environmental regulations).
Generally speaking they will only listen when this is convenient to them (some immediate material benefit is on the table), or if there is very large popular support, in which case they will take action in the way that allows them to put the least effort they can get away with.
There is, however, one case where technical p...
Cool, so MIRI is focusing on public passive support, PauseAI and others in active public support.
Now, can an org focus on the lobbying of pausing/ stopping (or redlines for killswitches) then?
What We’re Not Doing ... We are not investing in grass-roots advocacy, protests, demonstrations, and so on. We don’t think it plays to our strengths, and we are encouraged that others are making progress in this area.
Not speaking for the movement, but as a regular on Pause AI this makes sense to me. Perhaps we can interact more, though, and in particular I'd imagine we might collaborate on testing the effectiveness of content in changing minds.
...Execution ... The main thing holding us back from realizing this vision is staffing. ... We hope to hire more writ
I understand why MIRI has Yudkowsky, Bourgon, and Soares as "spokespeople" but I don't think they're good choices for all types of communications. You should look at popular science communicators such as Neil deGrasse Tyson or Malcolm Gladwell or popular TED talk presenters to see what kind of spokespeople appeal to regular people. I think it would be good to have someone more like that, but, you know...smarter and not wrong as often.
When I look at popular media, the person whose concerns about AI risks are cited most often is probably Geoffrey Hinton.
I am not convinced MIRI has given enough evidence to support the idea that unregulated AI will kill everyone and their children. Most of their projects are either secret or old papers. The only papers which have been produced after 2019 are random irrelevant math papers. Most of the rest of their papers are not even technical in nature and contain a lot of unverified claims. They have not even produced one paper since the breakthrough in LLM technology in 2022. Even among the papers which do indicate risk, there is no consensus among scientific peers...
just some actual consensus among established researchers to sift mathematical facts from conjecture.
"Scientific consensus" is a much much higher bar than peer review. Almost no topic of relevance has a scientific consensus (for example, there exists basically no trustworthy scientific for urban planning decisions, or the effects of minimum wage law, or pandemic prevention strategies, or cyber security risks, or intelligence enhancement). Many scientific peers think there is an extinction risk.
I think demanding scientific consensus is an unreasonably high bar that would approximately never be met in almost any policy discussion.
I am not convinced MIRI has given enough evidence to support the idea that unregulated AI will kill everyone and their children.
The way you're expressing this feels like an unnecessarily strong bar.
I think advocacy for an AI pause already seems pretty sensible to me if we accept the following premises:
Edited to add the following:
There's also a sense in which whether to pause...
1.1. The adoption of such laws is long way
Usually, it is a centuries-long path: Court decisions -> Actual enforcement of decisions -> Substantive law -> Procedures -> Codes -> Declaration then Conventions -> Codes.
Humanity does not have this much time, it is worth focusing on real results that people can actually see. It might be necessary to build some simulations to understand which behavior is irresponsible.
Where is the line between creating a concept of what is socially dangerous and what ...
I have never heard of the rock/wave communication strategy and can't seem to google it.
these are pretty standard communications tactics in the modern era.
Is this just unusual naming? Anybody have links?
With regard to the "Message and Tone" section, I mostly agree with the specific claims. But I think there is danger in taking it too far. I strongly recommend this post: https://www.lesswrong.com/posts/D2GrrrrfipHWPJSHh/book-review-how-minds-change
I'm concerned that the AI safety debate is becoming more and more polarized, sort of like US politics in general. I think many Americans are being very authentic and undiplomatic with each other when they argue online, in a way that doesn't effectively advance their policy objectives. Given how easily other i...
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
As we explained in our MIRI 2024 Mission and Strategy update, MIRI has pivoted to prioritize policy, communications, and technical governance research over technical alignment research. This follow-up post goes into detail about our communications strategy.
The Objective: Shut it Down[1]
Our objective is to convince major powers to shut down the development of frontier AI systems worldwide before it is too late. We believe that nothing less than this will prevent future misaligned smarter-than-human AI systems from destroying humanity. Persuading governments worldwide to take sufficiently drastic action will not be easy, but we believe this is the most viable path.
Policymakers deal mostly in compromise: they form coalitions by giving a little here to gain a little somewhere else. We are concerned that most legislation intended to keep humanity alive will go through the usual political processes and be ground down into ineffective compromises.
The only way we think we will get strong enough legislation is if policymakers actually get it, if they actually come to understand that building misaligned smarter-than-human systems will kill everyone, including their children. They will pass strong enough laws and enforce them if and only if they come to understand this central truth.
Meanwhile, the clock is ticking. AI labs continue to invest in developing and training more powerful systems. We do not seem to be close to getting the sweeping legislation we need. So while we lay the groundwork for helping humanity to wake up, we also have a less dramatic request. We ask that governments and AI labs install the “off-switch”[2] so that if, on some future day, they decide to shut it all down, they will be able to do so.
We want humanity to wake up and take AI x-risk seriously. We do not want to shift the Overton window, we want to shatter it.
Theory of Change
Now I’ll get into the details of how we’ll go about achieving our objective, and why we believe this is the way to do it. The facets I’ll consider are:
Audience
The main audience we want to reach is policymakers – the people in a position to enact the sweeping regulation and policy we want – and their staff.
However, narrowly targeting policymakers is expensive and probably insufficient. Some of them lack the background to be able to verify or even reason deeply about our claims. We must also reach at least some of the people policymakers turn to for advice. We are hopeful about reaching a subset of policy advisors who have the skill of thinking clearly and carefully about risk, particularly those with experience in national security. While we would love to reach the broader class of bureaucratically-legible “AI experts,” we don’t expect to convince a supermajority of that class, nor do we think this is a requirement.
We also need to reach the general public. Policymakers, especially elected ones, want to please their constituents, and the more the general public calls for regulation, the more likely that regulation becomes. Even if the specific measures we want are not universally popular, we think it helps a lot to have them in play, in the Overton window.
Most of the content we produce for these three audiences will be fairly basic, 101-level material. However, we don’t want to abandon our efforts to reach deeply technical people as well. They are our biggest advocates, most deeply persuaded, most likely to convince others, and least likely to be swayed by charismatic campaigns in the opposite direction. And more importantly, discussions with very technical audiences are important for putting ourselves on trial. We want to be held to a high standard and only technical audiences can do that.
Message and Tone
Since I joined MIRI as the Communications Manager a year ago, several people have told me we should be more diplomatic and less bold. The way you accomplish political goals, they said, is to play the game. You can’t be too out there, you have to stay well within the Overton window, you have to be pragmatic. You need to hoard status and credibility points, and you shouldn’t spend any on being weird.
While I believe those people were kind and had good intentions, we’re not following their advice. Many other organizations are taking that approach. We’re doing something different. We are simply telling the truth as we know it.
We do this for three reasons.
These people who offer me advice often assume that we are rubes, country bumpkins coming to the big city for the first time, simply unaware of how the game is played, needing basic media training and tutoring. They may be surprised to learn that we arrived at our message and tone thoughtfully, having considered all the options. We communicate the way we do intentionally because we think it has the best chance of real success. We understand that we may be discounted or uninvited in the short term, but meanwhile our reputation as straight shooters with a clear and uncomplicated agenda remains intact. We also acknowledge that we are relatively new to the world of communications and policy, we’re not perfect, and it is very likely that we are making some mistakes or miscalculations; we’ll continue to pay attention and update our strategy as we learn.
Channels
So far, we’ve experimented with op-eds, podcasts, and interviews with newspapers, magazines, and radio journalists. It’s hard to measure the effectiveness of these various channels, so we’re taking a wide-spectrum approach. We’re continuing to pursue all of these, and we’d like to expand into books, videos, and possibly film.
We also think in terms of two kinds of content: stable, durable, proactive content – called “rock” content – and live, reactive content that is responsive to current events – called “wave” content. Rock content includes our website, blog articles, books, and any artifact we make that we expect to remain useful for multiple years. Wave content, by contrast, is ephemeral, it follows the 24-hour news cycle, and lives mostly in social media and news.
We envision a cycle in which someone unfamiliar with AI x-risk might hear about us for the first time on a talk show or on social media – wave content – become interested in our message, and look us up to learn more. They might find our website or a book we wrote – rock content – and become more informed and concerned. Then they might choose to follow us on social media or subscribe to our newsletter – wave content again – so they regularly see reminders of our message in their feeds, and so on.
These are pretty standard communications tactics in the modern era. However, mapping out this cycle allows us to identify where we may be losing people, where we need to get stronger, where we need to build out more infrastructure or capacity.
Artifacts
What we find, when we map out that cycle, is that we have a lot of work to do almost everywhere, but that we should probably start with our rock content. That’s the foundation, the bedrock, the place where investment pays off the most over time.
And as such, we are currently exploring several communications projects in this area, including:
We have a lot more ideas than that, but we’re still deciding which ones we’ll invest in.
What We’re Not Doing
Focus helps with execution; it is also important to say what the comms team is not going to invest in.
We are not investing in grass-roots advocacy, protests, demonstrations, and so on. We don’t think it plays to our strengths, and we are encouraged that others are making progress in this area. Some of us as individuals do participate in protests.
We are not currently focused on building demos of frightening AI system capabilities. Again, this work does not play to our current strengths, and we see others working on this important area. We think the capabilities that concern us the most can’t really be shown in a demo; by the time they can, it will be too late. However, we appreciate and support the efforts of others to demonstrate intermediate or precursor capabilities.
We are not particularly investing in increasing Eliezer’s personal influence, fame, or reach; quite the opposite. We already find ourselves bottlenecked on his time, energy, and endurance. His profile will probably continue to grow as the public pays more and more attention to AI; a rising tide lifts all boats. However, we would like to diversify the public face of MIRI and potentially invest heavily in a spokesperson who is not Eliezer, if we can identify the right candidate.
Execution
The main thing holding us back from realizing this vision is staffing. The communications team is small, and there simply aren’t enough hours in the week to make progress on everything. As such, we’ve been hiring, and we intend to hire more.
We hope to hire more writers and we may promote someone into a Managing Editor position. We are exploring the idea of hiring or partnering with additional spokespeople, as well as hiring an additional generalist to run projects and someone to specialize in social media and multimedia.
Hiring for these roles is hard because we are looking for people who have top-tier communications skills, know how to restrict themselves to valid arguments, and are aligned with MIRI’s perspective. It’s much easier to find candidates with one or two of those qualities than to find people in the intersection. For these first few key hires we felt it was important to check all the boxes. We hope that once the team is bigger, it may be possible to hire people who write compelling, valid prose and train them on MIRI’s perspective. Our current sense is that it’s easier to explain AI x-risk to a competent, valid writer than it is to explain great writing to someone who already shares our perspective.
How to Help
The best way you can help is to normalize the subject of AI x-risk. We think many people who have been “in the know” about AI x-risk have largely kept silent about it over the years, or only talked to other insiders. If this describes you, we’re asking you to reconsider this policy, and try again (or for the first time) to talk to your friends and family about this topic. Find out what their questions are, where they get stuck, and try to help them through those stuck places.
As MIRI produces more 101-level content on this topic, share that content with your network. Tell us how it performs. Tell us if it actually helps, or where it falls short. Let us know what you wish we would produce next. (We're especially interested in stories of what actually happened, not just considerations of what might happen, when people encounter our content.)
Going beyond networking, please vote with AI x-risk considerations in mind.
If you are one of those people who has great communication skills and also really understands x-risk, come and work for us! Or share our job listings with people you know who might fit.
Subscribe to our newsletter. There’s a subscription form on our Get Involved page.
And finally, later this year we’ll be fundraising for the first time in five years, and we always appreciate your donations.
Thank you for reading and we look forward to your feedback.
We remain committed to the idea that failing to build smarter-than-human systems someday would be tragic and would squander a great deal of potential. We want humanity to build those systems, but only once we know how to do so safely.
By “off-switch” we mean that we would like labs and governments to plan ahead, to implement international AI compute governance frameworks and controls sufficient for halting the development of any dangerous AI development activity, and streamlined functional processes for doing so.