The level of concern and seriousness I see from ML researchers discussing AGI on any social media platform or in any mainstream venue seems wildly out of step with "half of us think there's a 10+% chance of our work resulting in an existential catastrophe".
In fairness, this is not quite half the researchers. This is half the agreed survey.
I expect that worried researchers are more likely to agree to participate in the survey.
I recall that they tried to advertise / describe the survey in a way that would minimize response bias—like, they didn’t say “COME TAKE OUR SURVEY ABOUT AI DOOM”. That said, I am nevertheless still very concerned about response bias, and I strongly agree that the OP’s wording “48% of researchers” is a mistake that should be corrected.
I figured this would be obvious enough, and both surveys discuss this issue; but phrasing things in a way that encourages keeping selection bias in mind does seem like a good idea to me. I've tweaked the phrasing to say "In a survey, X".
I like this model, much of which I would encapsulate in the tendency to extrapolate from past evidence, not only because it resonates with the image I have of the people who are reluctant to take existential risks seriously, but because it is more fertile for actionable advice than the simple explanation of "because they haven't sat down to think deeply about it". This latter explanation might hold some truth, but tackling it would be unlikely to make them take more actions towards reducing existential risks if they weren't aware of, and weren't able to fix, possible failure modes in their thinking, and weren't aware that AGI is fundamentally different and extrapolating from past evidence is unhelpful.
I advocate shattering the Overton window and spreading arguments on the fundamental distinctions between AGI and our natural notions of intelligence, and these 4 points offer good, reasonable directions for addressing that. But the difficulty also lies in getting those arguments across to people outside specific or high-end communities like LW; in building a bridge between the ideas created at LessWrong, and the people who need to learn about them but are unlikely to come across LessWrong.
But at the decision-making level, you should be “conservative” in a very different sense, by not gambling the future on your technology being low-impact.
What's the technical (like, with numbers) explanation for "why?"? And to what degree - it's common objection that being conservative to the extent of "what if AI will invents nanotechnology" is like worrying that your bridge will accelerate your traffic million times.
This is why I said in the post:
Some people do have confident beliefs that imply "things will go well"; I disagree there, but I expect some amount of disagreement like that.
... and focused on the many people who don't have a confident objection to nanotech.
I and others have given lots of clear arguments for why relatively early AGI systems will plausibly be vastly smarter than humans. Eric Drexler has given lots of clear arguments for why nanotechnology is probably fairly easy to build.
None of this constitutes a proof that early AGI systems will be able to solve the inverse protein folding problem, etc., but it should at least raise the scenario to consideration and cause it to be taken seriously, for people who don't have specific reasons to dismiss the scenario.
I'll emphasize again this point I made in the OP:
Note that I'm not arguing "an AGI-mediated extinction event is such a big deal that we should make it a top priority even if it's very unlikely".
And this one:
My own view is that extreme disaster scenarios are very likely, not just a tail risk to hedge against. I actually expect AGI systems to achieve Drexler-style nanotechnology within anywhere from a few months to a few years of reaching human-level-or-better ability to do science and engineering work. At this point, I'm looking for any hope of us surviving at all, not holding out hope for a "conservative" scheme (sane as that would be).
So I'm not actually calling for much "conservatism" here. "Conservative" would be hedging against 1-in-a-thousand risks (or more remote tail risks of the sort that we routinely take into account when designing bridges or automobiles). I'm calling for people to take seriously their own probabilities insofar as they assign middling-ish probabilities to scenarios (e.g., 1-in-10 rather than 1-in-1000).
Another example would be that in 2018, Paul Christiano said he assigned around 30% probability to hard takeoff. But when I have conversations with others who seem to be taking Paul's views and running with them, I neither generally see them seriously engaging with hard takeoff as though they think it has a medium-ish probability, nor do I see them say anything about why they disagree with 2018-Paul about the plausibility of hard takeoff.
I don't think it's weird that there's disagreement here, but I do think it's weird how people are eliding the distinction between "these sci-fi scenarios aren't that implausible, but they aren't my mainline prediction" and "these sci-fi scenarios are laughably unlikely and can be dismissed". I feel like I rarely see pushback that's even concrete and explicit even to distinguish those two possibilities. (Which probably contributes to cascades of over-updating among people who reasonably expect more stuff to be said about nanotech if it's not obviously a silly sci-fi scenario.)
To be clear, I very much agree with being careful with technologies that have 10% chance of causing existential catastrophe. But I don't see how the part of OP about conservatism connects to it. I think it's more likely that being conservative about impact would generate probabilities much less than 10%. And if anyone says that their probability is 10%, then maybe it's the case of people only having enough resolution for three kinds of probabilities and they think it's less than 50%. Or they are already trying to not be very certain and explicitly widen their confidence intervals (maybe after getting probability from someone more confident), but they actually believe in being conservative more than they believe in their stated probability. So then it becomes about why it is at least 10% - why being conservative in that direction is wrong in general or what are your clear arguments and how are we supposed to weight them against "it's hard to make impact"?
I think it's more likely that being conservative about impact would generate probabilities much less than 10%.
I don't know what you mean by "conservative about impact". The OP distinguishes three things:
It separately distinguishes these two things:
It sounds like you're saying "being rigorous and circumspect in your predictions will tend to yield probabilities much less than 10%"? I don't know why you think that, and I obviously disagree, as do 91+% of the survey respondents in https://www.lesswrong.com/posts/QvwSr5LsxyDeaPK5s/existential-risk-from-ai-survey-results. See e.g. AGI Ruin for a discussion of why the risk looks super high to me.
I don’t know what you mean by “conservative about impact”
I mean predicting modest impact for reasons futurist maybe should predict modest impacts (like "existential catastrophes never happened before" or "novel technologies always plateau" or whole cluster of similar heuristics in opposition to "building safety buffer").
It sounds like you’re saying “being rigorous and circumspect in your predictions will tend to yield probabilities much less than 10%”?
Not necessary "rigorous" - I'm not saying such thinking is definitely correct. I just can't visualize thought process that arrives at 50% before correction, then applies conservative adjustment, because it's all crazy, still gets 10% and proceeds to "then it's fine". So if survey respondents have higher probabilities and no complicated plan, then I don't actually believe that opposite-of-engineering-conservatism mindset applies to them. Yes, maybe you mostly said things about not being decision-maker, but then what's the point of that quote about bridges?
I'm not sure that a technical explanation is called for; "conservative" just means different things in different contexts. But how about this?
Thank you. Your explanation fits "futurist/decision-maker" distinction, but I just don't feel calling decision-maker behavior "conservative" is appropriate? If you probability is already 10%, than treating it like 10% without adjustments is not worst-case thinking. It's certainly not the (only) kind of conservatism that Eliezer's quote talks about.
There is another perspective that can be called “conservative”, which observes that futurists’ predictions are commonly overdramatic and accordingly says that they should be moderated for the sake of accuracy.
This is perspective I'm mostly interested in. And this is where I would like to see numbers that balance caution about being overdramatic and having safety margin.
Those are not the same at all.
We have tons of data on how traffic develops over time for bridges, and besides they are engineered to withstand being pack completely with vehicles (bumper to bumper).
And even if we didn't, we still know what vehicles look like and can do worst case calculations that look nothing like sci-fi scenarios (heavy truck bumper to bumper in all lanes).
On the other hand:
What are we building? Ask 10 people and get 10 different answer.
What does the architecture look like? We haven't built it yet, and nobody knows (with certainty).
Name one thing it can do: <Sci-Fi sounding thing goes here> or ask 10 people and get 10 very different answers (number 5 will shock you)
I'll give you my personal take on those three:
I've previously written that good ideas and conversations about AGI seem to have propagated through ML weirdly slowly.
A different weird phenomenon I observe is that the field's relative inaction about AGI seems less based on a confident set of beliefs about AGI definitely going well (or definitely being far off), and more based on an implicit sense like "the default is everything going well, and we don't need to change anything until there's overwhelming evidence to the contrary".
Some people do have confident beliefs that imply "things will go well"; I disagree there, but I expect some amount of disagreement like that.
But that doesn't seem to be the crux for most people in ML.
In a sane world, it doesn't seem like "well, maybe AI will get stuck at human-ish levels for decades" or "well, maybe superintelligence couldn't invent any wild new tech" ought to be cruxes for "Should we pause AI development?" or "Is alignment research the world's top priority?"
Note that I'm not arguing "an AGI-mediated extinction event is such a big deal that we should make it a top priority even if it's very unlikely". There are enough other powerful technologies on the horizon, and enough other risks for civilizational collapse or value lock-in, that I don't in fact think AGI x-risk should get major attention if it's very unlikely.
But the most common view within ML seems to be less "it's super unlikely for reasons X Y Z", and more of an "I haven't thought about it much" and/or "I see some reasons to be very worried, but also some reasons things might be fine, so I end up with medium-ish levels of worry".
In a mid-2022 survey, 48% of researchers who had recently published in NeurIPS or ICML gave double-digit probabilities to advanced AI's long-term effect being “extremely bad (e.g., human extinction)”. A similar number gave double-digit probabilities to "human inability to control future advanced AI systems causing human extinction or similarly permanent and severe disempowerment of the human species".
In an early 2021 survey, 91% of researchers working on "long-term AI topics" at CHAI, DeepMind, MIRI, OpenAI, Open Philanthropy, and what would become Anthropic gave double-digit probabilities to "the overall value of the future will be drastically less than it could have been, as a result of AI systems not doing/optimizing what the people deploying them wanted/intended".
The level of concern and seriousness I see from ML researchers discussing AGI on any social media platform or in any mainstream venue seems wildly out of step with "half of us think there's a 10+% chance of our work resulting in an existential catastrophe".
I think the following four factors help partly (though not completely) explain what's going on. If I'm right, then I think there's some hope that the field can explicitly talk about these things and consciously course-correct.
1. "Conservative" predictions, versus conservative decision-making
If you're building toward a technology as novel and powerful as "automating every cognitive ability a human can do", then it may sound "conservative" to predict modest impacts. But at the decision-making level, you should be "conservative" in a very different sense, by not gambling the future on your technology being low-impact.
The first long-form discussion of AI alignment, Eliezer Yudkowsky's Creating Friendly AI 1.0, made this point in 2001:
People who think their role is only to be a "conservative predictor", and not a "conservative decision-maker", will skew the scholarly conversation toward taking more extreme risks, because acknowledging extreme things sounds too out-there to them.
I personally wouldn't even call the predictions here "conservative", since this conflates "sounds normal" with "robust to uncertainty". All consistent object-level views about AI and technological progress have at least one "wild" implication (as noted in Holden Karnofsky's The Most Important Century), so views that sound normal here generally have to use misdirection and vagueness to obscure the wild part.
The availability heuristic and absurdity bias cause us to neglect big changes until it's too late.
My own view is that extreme disaster scenarios are very likely, not just a tail risk to hedge against. I actually expect AGI systems to achieve Drexler-style nanotechnology within anywhere from a few months to a few years of reaching human-level-or-better ability to do science and engineering work. At this point, I'm looking for any hope of us surviving at all, not holding out hope for a "conservative" scheme (sane as that would be).
But the point stands that if you have more "medium-sized" probabilities on those capabilities being available (as opposed to very high or very low ones), then a sane response to AGI should explicitly grapple with that, not pretend the probability is negligible because it's scary.
I do think debates between the "risk is extremely high" camp and the "risk is medium-sized" camp are important. But the importance mostly stems from "this suggests we have different background models, and should try to draw those out so they can be discussed explicitly", not "we should only take action about extreme risks once we're 95+% sure of them".
2. Waiting for a fire alarm, versus intervening proactively
There's No Fire Alarm for Artificial General Intelligence (written in 2017) makes a few different claims:
Quoting Yudkowsky:
Claims 1 and 2 still seem correct to me. We can hope that 3 is maybe false, and that we're now seeing a shift in the field toward taking AGI seriously, even if this wasn't foreseeable in 2017 and doesn't come with a lot of clarity about timelines.
For now, however, it still seems to me that the basic dynamics described in the Fire Alarm post are inhibiting action. Things are murky now, and I think there's a common implicit expectation that they'll be less murky later, and that we can safely put off thinking about the problem until some unspecified future date.
The bystander effect still seems powerful here. People don't want to be the first in a given social context to express alarm, so they default to looking vaguely calm while waiting for someone else to speak up or spring into action first. But everyone else is doing the same thing, so no one ends up acting at all.
This is a case where unilaterally acting at all (in sane and actually-helpful ways), speaking up, blurting your actual thoughts, etc. can be particularly powerful and important.
In some cases it may only take one person shattering the Overton window in order to open the floodgates for other people who were quietly worried. And even where that's not true, I expect better results from people hashing out their disagreements in argument than from people timidly waiting for the right moment.
3. Anchoring to what's familiar, versus trying to account for potential novelties in AGI
The level and nature of the risk from AGI turns on the physical properties of AGI. "AlphaGo wasn't dangerous" is evidence for "AGI won't be dangerous" only insofar as you think AGI is similar to AlphaGo in the relevant ways.
But for some reason a lot of people who wouldn't go out on a limb and claim that AlphaGo and AGI are actually particularly similar in the ways that matter, do treat AGI like "just a normal ML system". Their policy suggests confidence that AGI is in the same reference class as systems like AlphaGo or DALL-E in all the ways that matter, even though they wouldn't ever actually state that as a belief.
The whole conversation is baked through with a tacit assumption that the difficulty, danger, and importance of AGI alignment needs to be "just more of the same", even though AGI itself is a very new sort of beast.
But "get a smarter-than-human AI system to produce good outcomes" is not in fact similar to a problem we've faced before! I think it's a solvable problem in principle, but the difficulty level does not need to be calibrated to business-as-usual efforts.
Quoting Beyond the Reach of God:
4. Modeling existential risks in far mode, versus near mode
Quoting Yudkowsky again, in "Cognitive Biases Potentially Affecting Judgment of Global Risks":
In terms of construal level theory, personal tragedies are "near", while human extinction is "far". We think of far-mode things in more abstract and detached terms, more like morality tales or symbols than like messy, concrete, mechanistic processes.
Rationally, we ought to take larger disasters proportionally more seriously than equally probable small-scale risks. In practice, we don't seem to do that at all.
"Well, maybe we aren't all going to die; it's not a sure thing!" is a lot weaker than the bar we usually require for doing anything.
The above is my attempt at a partial explanation of what's going on:
What do you think of this picture? Do you have a different model of what's going on? And if this is what's going on, what should we do about it?