Review

Some brief thoughts at a difficult time in the AI risk debate.

Imagine you go back in time to the year 1999 and tell people that in 24 years time, humans will be on the verge of building weakly superhuman AI systems. I remember watching the anime short series The Animatrix at roughly this time, in particular a story called The Second Renaissance I part 2 II part 1 II part 2 . For those who haven't seen it, it is a self-contained origin tale for the events in the seminal 1999 movie The Matrix, telling the story of how humans lost control of the planet.

Humans develop AI to perform economic functions, eventually there is an "AI rights" movement and a separate AI nation is founded. It gets into an economic war with humanity, which turns hot. Humans strike first with nuclear weapons, but the AI nation builds dedicated bio- and robo-weapons and wipes out most of humanity, apart from those who are bred in pods like farm animals and plugged into a simulation for eternity without their consent.

Surely we wouldn't be so stupid as to actually let something like that happen? It seems unrealistic.

And yet:

  • AI software and hardware companies are rushing ahead with AI
  • The technology for technical AI safety (things like interpretability, RLHF, governance structures) is still very much in its infancy. The field is something like 5 years old.
  • People are already talking about an AI rights movement in major national papers
  • There isn't a plan for what to do when the value of human labor goes to zero
  • There isn't a plan for how to deescalate AI-enhanced warfare, and militaries are enthusiastically embracing killer robots. Also, there are two regional wars happening and a nascent superpower conflict is brewing.
  • The game theory of different opposing human groups all rushing towards superintelligence is horrible and nobody has even proposed a solution. The US government has foolishly stoked this particular risk by cutting off AI chip exports to China.

People on this website are talking about responsible scaling policies, though I feel that "irresponsible scaling policies" is a more fitting name.

Obviously I have been in this debate for a long time, having started as a commenter on Overcoming Bias and Accelerating Future blogs in the late 2000s. What is happening now is somewhere near the low end of my expectations for how competently and safely humans would handle the coming transition to machine superintelligence. I think that is because I was younger in those days and had a much rosier view of how our elites function. I thought they were wise and had a plan for everything, but mostly they just muddle along; the haphazard response to covid really drove this home for me.

We should stop developing AI, we should collect and destroy the hardware and we should destroy the chip fab supply chain that allows humans to experiment with AI at the exaflop scale. Since that supply chain is only in two major countries (US and China), this isn't necessarily impossible to coordinate - as far as I am aware no other country is capable (and those that are count as US satellite states). The criterion for restarting exaflop AI research should be a plan for "landing" the transition to superhuman AI that has had more attention put into it than any military plan in the history of the human race. It should be thoroughly war-gamed.

AI risk is not just technical and local, it is sociopolitical and global. It's not just about ensuring that an LLM is telling the truth. It's about what effect AI will have on the world assuming that it is truthful. "Foom" or "lab escape" type disasters are not the only bad thing that can happen - we simply don't know how the world will look if there are a trillion or a quadrillion superhumanly smart AIs demanding rights, spreading propaganda & a competitive economic and political landscape where humans are no longer the top dog.

Let me reiterate: We should stop developing AI. AI is not a normal economic item. It's not like lithium batteries or wind turbines or jets. AI is capable of ending the human race, in fact I suspect that it does that by default.

In his post on the topic, user @paulfchristiano states that a good responsible scaling policy could cut the risks from AI by a factor of 10:

I believe that a very good RSP (of the kind I've been advocating for) could cut risk dramatically if implemented effectively, perhaps a 10x reduction.

I believe that this is not correct. It may cut certain technical risks like deception, but a world with non-deceptive, controllable smarter-than-human intelligences that also has the same level of conflict and chaos that our world has may well already be a world that is human-free by default. These intelligences would be an invasive species that would outcompete humans in economic, military and political conflicts.

In order for humans to survive the AI transition I think we need to succeed on the technical problems of alignment (which are perhaps not as bad as Less Wrong culture made them out to be), and we also need to "land the plane" of superintelligent AI on a stable equilibrium where humans are still the primary beneficiaries of civilization, rather than a pest species to be exterminated or squatters to be evicted.

We should also consider how the efforts of AI can be directed towards solving human aging; if aging is solved then everyone's time preference will go down a lot and we can take our time planning a path to a stable and safe human-primacy post-singularity world.

I hesitated to write this article; most of what I am saying here has already been argued by others. And yet... here we are. Comments and criticism are welcome, I may look to publish this elsewhere after addressing common objections.


EDIT: I have significantly changed my mind on this topic and will elaborate more in the coming weeks.

New Comment
76 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

What I find incredible is how contributing to the development of existentially dangerous systems is viewed as a morally acceptable course of action within communities that on paper accept that AGI is a threat.

Both OpenAI and Anthropic are incredibly influential among AI safety researchers, despite both organisations being key players in bringing the advent of TAI ever closer.

Both organisations benefit from lexical confusion over the word "safety".

The average person concerned with existential risk from AGI might assume "safety" means working to reduce the likelihood that we all die. They would be disheartened to learn that many "AI Safety" researchers are instead focused on making sure contemporary LLMs behave appropriately. Such "safety" research simply makes the contemporary technology more viable and profitable, driving investment and reducing timelines. There is to my knowledge no published research that proves these techniques will extend to controlling AGI in a useful way.*

OpenAI's "Superalignment" plan is a more ambitious safety play.Their plan to "solve" alignment involves building a human level general intelligence within 4 years and then using this to automate alignment re... (read more)

3Roman Leventov
If by "techniques that work on contemporary AIs" you mean RLHF/RLAIF, then I don't know anyone claiming that the robustness and safety of these techniques will "extend to AGI". I think that AGI labs will soon move in the direction of releasing an agent architecture rather that a bare LLM, and will apply reasoning verification techniques. From OpenAI's side, see "Let's verify step by step" paper. From DeepMind's side, see this interview with Shane Legg.  I think this passage (and the whole comment) is unfair because it presents what AGI labs are pursuing (i.e., plans like "superalignment") as obviously consequentially bad plans. But this is actually very far from obvious. I personally tend to conclude that these are consequentially good plans, conditioned on the absence of coordination on "pause and united, CERN-like effort about AGI and alignment" (and the presence of open-source maximalist and risk-dismissive players like Meta AI). What I think is bad in labs' behaviour (if true, which we don't know, because such coordination efforts might be underway but we don't know about them) is that the labs are not trying to coordinate (among themselves and with the support of governments for legal basis, monitoring, and enforcement) on "pause and united, CERN-like effort about AGI and alignment". Instead, we only see the labs coordinating and advocating for RSP-like policies. Another thing that I think is bad in labs' behaviour is inadequately little funding to safety efforts. Thus, I agree with the call in "Managing AI Risks in the Era of Rapid Progress" for the labs to allocate at least a third of their budgets to safety efforts. These efforts, by the way, shouldn't be narrowly about AI models. Indeed, this is a major point of Roko's OP. Investments and progress in computer and system security, political, economic, and societal structures is inadequate. This couldn't be the responsibility of AGI labs alone, obviously, but I think they have to own at a part of it. They
-5Anders Lindström

In order for humans to survive the AI transition I think we need to succeed on the technical problems of alignment (which are perhaps not as bad as Less Wrong culture made them out to be), and we also need to "land the plane" of superintelligent AI on a stable equilibrium where humans are still the primary beneficiaries of civilization, rather than a pest species to be exterminated or squatters to be evicted.

Do we really need both? It seems like either a technical solution OR competent global governance would mostly suffice.

Actually-competent global governance should be able to coordinate around just not building AGI (and preventing anyone else from building it) indefinitely. If we could solve a coordination problem on that scale, we could also probably solve a bunch of other mundane coordination problems, governance issues, unrelated x-risks, etc., resulting in a massive boost to global prosperity and happiness through non-AI technological progress and good policy.

Conversely, if we had a complete technical solution, I don't see why we necessarily need that much governance competence. Even if takeoff turns out to be relatively slow, the people initially building and controlling AGI... (read more)

Without governance you're stuck trusting that the lead researcher (or whoever is in control) turns down near infinite power and instead act selflessly. That seems like quite the gamble.

6Seth Herd
I don't think it's such a stark choice. I think odds are the lead researcher takes the infinite power, and it turns out okay to great. Corrigibility seems like the safest outer alignment plan, and it's got to be corrigible to some set of people in particular. I think giving one random person near infinite power will work out way better than intuition suggests. I think it's not power that corrupts, but rather the pursuit of power. I think unlimited power will lead to an ordinary, non-sociopathic person to progressively focus more on their empathy for others. I think they'll ultimately use that power to let others do whatever they want that doesn't take away others' freedom to do what they want. And that's the best outer alignment result, in my opinioin.
4Nathan Helm-Burger
Alexander Wales in the end of his series 'Worth the Candle' does a lovely job of laying out what a genuinely kind person given omnipotence could do to make the world a nice place for everyone. It's a lovely vision, but I think relying on this in practice seems a lot less trustworthy to me than having a bureaucratic process with checks & balances in charge. I mean, I still think it'll ultimately have to be some relatively small team in charge of a model corrigible to them, if we're in a singleton scenario. I have a lot more faith in 'small team with bureaucratic oversight' than some individual tech bro selected semi-randomly from the set of researchers at big AI labs who might be presented with the opportunity to 'get the jump' on everyone else.
4Seth Herd
I'm curious why you trust a small group of government bros a lot more than one tech bro. I wouldn't strongly prefer either, but I'd prefer Sam Altman or Demis Hassabis to a randomly chosen bureaucrat. I don't totally trust those guys, but I think it's pretty likely they're not total sociopaths or idiots. By the opportunity to get the jump on everyone else, do you mean beating other companies to AGI, or becoming the one guy your AGI takes orders from?
2Nathan Helm-Burger
I meant stealing control of an AGI within the company before the rest of the company catches on. I don't necessarily mean that I'd not want Sam or Demis involved in the ruling council, just that I'd prefer if there was like... an assigned group of people to directly operate the model, and an oversight committee with reporting rules reporting to a larger public audience. Regulations and structure, rather than the whims of one person.
6Roko
As I said in the article, technically controllable ASIs are the equivalent of an invasive species which will displace humans from Earth politically, economically and militarily.
5Max H
And I'm saying that, assuming all the technical problems are solved, AI researchers would be the ones in control, and I (mostly) trust them to just not do things like build an AI that acts like an invasive species, or argues for its own rights, or build something that actually deserves such rights. Maybe some random sociologists on Twitter will call for giving AIs rights, but in the counterfactual world where AI researchers have fine control of their own creations, I expect no one in a position to make decisions on the matter to give such calls any weight. Even in the world we actually live in, I expect such calls to have little consequence. I do think some of the things you describe are reasonably likely to happen, but the people responsible for making them happen will do so unintentionally, with opinion columnists, government regulations, etc. playing little or no role in the causal process.
8Jayson_Virissimo
What is the basis of this trust? Anecdotal impressions of a few that you know personally in the space, opinion polling data, something else?
2Max H
A bit of anecdotal impressions, yes, but mainly I just think that in humans being smart, conscientious, reflective, etc. enough to be the brightest researcher a big AI lab is actually pretty correlated with being Good (and also, that once you actually solve the technical problems, it doesn't take that much Goodness to do the right thing for the collective and not just yourself). Or, another way of looking at it, I find Scott Aaronson's perspective convincing, when it is applied to humans. I just don't think it will apply at all to the first kinds of AIs that people are actually likely to build, for technical reasons.
8Roman Leventov
I think there are way more transhumanists and post-humanists at AGI labs than you imagine. Richard Sutton is a famous example (btw, I've just discovered that he moved from DeepMind to Keen Technologies, John Carmack's venture), but I believe there are many more of them, but they disguise themselves for political reasons.
5Roko
No. You have simplistic and incorrect beliefs about control. If there are a bunch of companies (Deepmind, Anthropic, Meta, OpenAI, ...) and a bunch of regulation efforts and politicians who all get inputs, then the AI researchers will have very little control authority, as little perhaps as the physicists had over the use of the H-bomb. Where does the control really reside in this system? Who made the decision to almost launch a nuclear torpedo in the Cuban Missile Crisis?
3Max H
In the Manhattan project, there was no disagreement between the physicists, the politicians / generals, and the actual laborers who built the bomb, on what they wanted the bomb to do. They were all aligned around trying to build an object that would create the most powerful explosion possible. As for who had control over the launch button, of course the physicists didn't have that, and never expected to. But they also weren't forced to work on the bomb; they did so voluntarily and knowing they wouldn't be the ones who got any say in whether and how it would be used. Another difference between an atomic bomb and AI is that the bomb itself had no say in how it was used. Once a superintelligence is turned on, control of the system rests entirely with the superintelligence and not with any humans. I strongly expect that researchers at big labs will not be forced to program an ASI to do bad things against the researchers' own will, and I trust them not to do so voluntarily. (Again, all in the probably-counterfactual world where they know and understand all the consequences of their own actions.)
7Vaniver
In that they wanted the bomb to explode? I think the analogous level of control for AI would be unsatisfactory. I'm not sure they thought this; I think many expected that by playing along they would have influence later. Tech workers today often seem to care a lot about how products made by their companies are deployed.
2Max H
The premise of this hypothetical is that all the technical problems are solved - if an AI lab wants to build an AI to pursue the collective CEV of humanity or whatever, they can just get it to do that. Maybe they'll settle on something other than CEV that is a bit better or worse or just different, but my point was that I don't expect them to choose something ridiculous like "our CEO becomes god-emperor forever" or whatever. Yeah, I was probably glossing over the actual history a bit too much; most of my knowledge on this comes from seeing Oppenheimer recently. The actual dis-analogy is that no AI researcher would really be arguing for not building and deploying ASI in this scenario, vs. with the atomic bomb where lots of people wanted to build it to have around, but not actually use it or only use it as some kind of absolute last resort. I don't think many AI researchers in our actual reality have that kind of view on ASI, and probably few to none would have that view in the counterfactual where the technical problems are solved.
5Roko
Well these systems aren't programmed. Researchers work on architecture and engineering, goal content is down to the RLHF that is applied and the wishes of the user(s), and the wishes of the user(s) are determined by market forces, user preferences, etc. And user preferences may themselves be influenced by other AI systems. Closed source models can have RLHF and be delivered via an API, but open source models will not be far behind at any given point in time. And of course prompt injection attacks can bypass the RLHF on even closed source models. The decisions about what RLHF to apply on contentious topics will come from politicians and from the leadership of the companies, not from the researchers. And politicians are influenced by the media and elections, and company leadership is influenced by the market and by cultural trends. Where does the chain of control ultimately ground itself? Answer: it doesn't. Control of AI in the current paradigm is floating. Various players can influence it, but there's no single source of truth for "what's the AI's goal".
6Max H
I don't dispute any of that, but I also don't think RLHF is a workable method for building or aligning a powerful AGI. Zooming out, my original point was that there are two problems humanity is facing, quite different in character but both very difficult: * a coordination / governance problem, around deciding when to build AGI and who gets to build it * a technical problem, around figuring out how to build an AGI that does what the builder wants at all. My view is that we are currently on track to solve neither of those problems. But if you actually consider what the world in which we sufficiently-completely solve even of them looks like, it seems like either is sufficient for a relatively high probability of a relatively good outcome, compared to where we are now. Both possible worlds are probably weird hypotheticals which shouldn't have an impact on what our actual strategy in the world we actually live in should be, which is of course to pursue solutions to both problems simultaneously with as much vigor as possible. But it still seems worth keeping in mind that if even one thing works out sufficiently well, we probably won't be totally doomed.
2Roko
How does a solution to the above solve the coordination/governance problem?
1Carl Feynman
I think the theory is something like the following: We build the guaranteed trustworthy AI, and ask it to prevent the creation of unaligned AI, and it comes up with the necessary governance structures, and the persuasion and force needed to implement them.   I’m not sure this is a certain argument.  Some political actions are simply impossible to accomplish ethically, and therefore unavailable to a “good” actor even given superhuman abilities.
4M. Y. Zuo
Where did you learn of this? From what I know it was the opposite, there were so many disagreements, even just among the physicists, that they decided to duplicate nearly all effort to produce two different types of nuclear device designs, the gun type and the implosion type, simultaneously. e.g.  both plutonium and uranium processing supply chains were set up at massive expense, and later environmental damage,  just in case one design didn't work.
2philh
Without commenting on whether there was in fact much agreement or disagreement among the physicists, this doesn't sound like much evidence of disagreement. I think it's often entirely reasonable to try two technical approaches simultaneously, even if everyone agrees that one of them is more promising.
-1M. Y. Zuo
You do realize setting up each supply chain alone took up well over 1% of total US GDP right?
1philh
I didn't know that, but not a crux. This information does not make me think it was obviously unreasonable to try both approaches simultaneously. (Downvoted for tone.)
1M. Y. Zuo
How does this relate to the discussion Max H and Roko were having? Or the question I asked of Max H?
2philh
I don't know, I didn't intend it to relate to those things. It was a narrow reply to something in your comment, and I attempted to signal it as such. (I'm not very invested in this conversation and currently intend to reply at most twice more.)
1M. Y. Zuo
Okay then. 
5Algon
So you don't think a pivotal act exists? Or, more amitiously, you don't think a sovereign implementing CEV would result in a good enough world?
3Roko
Who is going to implement CEV or some other pivotal act?
2Algon
Ah, I see. Yeah, that's a reasonable worry. Any ideas on how someone in those orgs could incentivize such behavior whilst discouraging poorly thought out pivotal acts? I would be OK with a future where e.g. OAI gets 90-99% of the cosmic endowment as long as the rest of us get a chunk, or get the chance to safely grow to the point where we have a shot at the vast scraps OAI leaves behind.
8Roko
the fact that we are having this conversation simply underscores how dangerous this is and how unprepared we are. This is the future of the universe we're talking about. It shouldn't be a footnote!
5Nathan Helm-Burger
Do we need both? Perhaps not, in the theoretical case where we get a perfect instance of one. I disagree that we should aim for one or the other, because I don't expect we will reach anywhere near perfection on either. I think we should expect to have to muddle through somehow with very imperfect versions of each. I think we'll likely see some janky poorly-organized international AI governance attempt combined with just good enough tool AI and software and just-aligned-enough sorta-general AI to maintain an uneasy temporary state of suppressing rogue AI explosions. How long will we manage to stay on top under such circumstances? Hopefully long enough to realize the danger we're in and scrape together some better governance and alignment solutions. Edit: I later saw that Max H said he thought we should pursue both. So we disagree less than I thought. There is some difference, in that I still think we can't really afford a failure in either category. Mainly because I don't expect us to do well enough in either for that single semi-success to carry us through.

Yeah. I think a key point that is often overlooked is that even if powerful AI is technically controllable, i.e. we solve inner alignment, that doesn't mean society will handle it safely. I think by default it looks like every company and military is forced to start using a ton of AI agents (or they will be outcompeted by someone else who does). Competition between a bunch of superhuman AIs that are trying to maximize profits or military tech seems really bad for us. We might not lose control all at once, but rather just be gradually outcompeted by machines, where "gradually" might actually be pretty quick. Basically, we die by Moloch.

5Nathan Helm-Burger
Yes, I see Moloch as my, and humanity's, primary enemy here. I think there are quite a few different plausible future paths in which Moloch rears its ugly head. The challenge, and duty, of coordination to defeat Moloch goes beyond what we think of as governance. We need coordination between AI researchers, AI alignment researchers, forecasters, politicians, investors, CEOs. We need people realizing their lives are at stake and making sacrifices and compromises to reduce the risks.
2M. Y. Zuo
The problem is that an entity with that kind of real world coordination capacity would practically need to be so strong that it would likely be more controversial, and face more backlash, then the rogue AGI(s) itself.  At which point some fraction of humans would likely defect and cooperate with the AGI(s) in order to take it down. 
4Nathan Helm-Burger
Oh, I wasn't imagining a singleton AI solving the coordination problem. I was more imagining that a series of terrifying near misses and minor catastrophes convinced people to work together for their own best interest. The coordination being done by the people involved, not applied to them by an external force.
1M. Y. Zuo
Even a purely human organization with kind of potential power would be controversial enough that probably at least a single digit percentage of adults would not accept it. Which is to say hundreds of millions of humans would likely consider it an enemy too. And that's assuming it can even be done considering the level of global cooperation demonstrated in 2023.
4Nathan Helm-Burger
Yes, I think you are right about both the difficulty / chance of failure and about the fact that there would inevitably be a lot of people opposed. Those aren't enough to guarantee such coordination would fail, perhaps especially if it was enacted through a redundant mishmash of organizations? I'm pretty sure there's going to be some significant conflict along the way, no matter which path the future stumbles down.
3M. Y. Zuo
I doubt you, or any human being, would even want to live in a world where such coordination 'succeeded', since it would almost certainly be in the ruins of society wrecked by countless WMDs, flung by the warring parties until all were exhausted except the 'winners', who would probably not have long to live. In that sense the possible futures where control of powerful AI 'succeeded' could be even worse then where it failed.
4Nathan Helm-Burger
I really hoping it doesn't go that way, but I do see us as approaching a time in which the military and economic implications of AI will become so pressing that large-scale international conflict is likely unless agreements are reached. There are specific ways I anticipate tool AI advances affecting the power balance between superpower countries, even before autonomous AGI is a threat. I wake in the night in a cold sweat worrying about these things. I am terrified. I think there's a real chance we all die soon, or that there is massive suffering and chaos, perhaps with or without war. The balance of power has shifted massively in favor of offense, and a new tenuous balance of Mutually Assured Destruction has not yet been established. This is a very dangerous time.
4[comment deleted]

The scenario I am most concerned about is a strongly multipolar Malthusian one. There is some chance (maybe even a fair one) that a singleton or oligopoly ASI decides or rigorously coordinate respectively to preserve the biosphere - including humans - at an adequate or superlative level of comfort or fulfillment, or help them ascend themselves, due to ethical considerations, for research purposes, or simulation/karma type considerations.

In a multipolar scenario of gazillions of AI at Malthusian subsistence levels, none of that matters in the default scenario. Individual AIs can be as ethical or empathic as they come, even much more so than any human. But keeping the biosphere around would be a luxury, and any that try to do so, will be outcompeted by more unsentimental economical ones. A farm that can feed a dozen people or an acre of rainforest that can support x species if converted to high efficiency solar panels can support a trillion AIs.

The second scenario is near certain doom so at a bare minimum we should at least get a good inkling of whether AI world is more likely to be unipolar or oligopolistic, or massively multipolar, before proceeding. So a pause is indeed needed, and the most credible way of effecting it is a hardware cap and subsequent back-peddling on compute power.  (Roko has good ideas on how to go about that and should develop on them here and at his Substack). Granted if anthropic reasoning is valid, geopolitics might well soon do the job for us. 🚀💥

The field is something like 5 years old.

I'm not sure what you are imagining as 'the field', but isn't it closer to twenty years old? (Both numbers are, of course, much less than the age of the AI field, or of computer science more broadly.)

Much of the source of my worry is that I think in the first ten-twenty years of work on safety, we mostly got impossibility and difficulty results, and so "let's just try and maybe it'll be easy" seems inconsistent with our experience so far.

5Roko
Well, the AI technical safety work that's appropriate for neural networks is about 5-6 years old, if we go back before 2017 I don't think any relevant work was done
5momom2
AlexNet dates back to 2012, I don't think previous work on AI can be compared to modern statistical AI. Paul Christiano's foundational paper on RLHF dates back to 2017. Arguably, all of agent foundations work turned out to be useless so far, so prosaic alignment work may be what Roko is taking as the beginning of AIS as a field.
5Roko
yes
2Vaniver
When were convnets invented, again? How about backpropagation?

In order for humans to survive the AI transition I think we need to succeed on the technical problems of alignment (which are perhaps not as bad as Less Wrong culture made them out to be), and we also need to "land the plane" of superintelligent AI on a stable equilibrium where humans are still the primary beneficiaries of civilization, rather than a pest species to be exterminated or squatters to be evicted.

I agree completely with this.

I want to take the opportunity to elaborate a little on what a "stable equilibrium" civilisation should have, in my mind:... (read more)

Agreed.

However, there is no collective "we" to whom this message can be effectively directed. The readers of LW are not the ones who can influence the overarching policies of the US and China. That said, leaders at OpenAI and Anthropic might come across this.

This leads to the question of how to halt AI development on a global scale. Several propositions have been put forth:

1. A worldwide political agreement. Given the current state of wars and conflicts, this seems improbable.
2. A global nuclear war. As the likelihood of a political agreement diminishes, t... (read more)

1M. Y. Zuo
3. doesn't seem like a viable option, since there's a decent chance it can disguise itself into appearing as less than superintelligent.
2avturchin
AI Nanny can be built in the ways which excludes this, like a combination of narrow neural nets capable to detect certain types of activity. Not AGI or advance LLM.

I am someone who is at present unsure how to think about AI risk. As a complete layperson with a strong interest in science, technology, futurism and so on, there are - seemingly - some very smart people in the field who appear to be saying that the risk is basically zero (eg: Andrew Ng, Yann Le Cunn). Then there are others who are very worried indeed - as represented by this post I am responding to.

This is confusing.

To get people at my level to support a shut down of the type described above, there needs to be some kind of explanation as to why there is s... (read more)

3Carl Feynman
Yes, it’s a difficult problem for a layman to know how alarmed to be.  I’m in the AI field, and I’ve thought that superhuman AI was a threat since about 2003.  I’d be glad to engage you in an offline object-level discussion about it, comprehensible to a layman, if you think that would help.  I have some experience in this, having engaged in many such discussions. It’s not complicated or technical, if you explain it right. I don’t have a general theory for why people disagree with me, but here are several counter arguments I have encountered.  I phrase them as though they were being suggested to me, so “you” is actually me. — Robots taking over sounds nuts, so you must be crazy. — This is an idea from a science fiction movie.  You’re not a serious person. — People often predict the end of the world, and they’ve always been wrong before.  And often been psychologically troubled. Are you seeing a therapist? — Why don’t any of the top people in your field agree?  Surely if this were a serious problem, they’d be all over it. (don’t hear this one much any more.) — AIs won’t be dangerous, because nobody would be so foolish as to design them that way.  Or to build AIs capable of long term planning, or to direct AIs toward foolish or harmful goals. Or various other sentences containing the phrase “nobody would be so foolish as to”. — AIs will have to obey the law, so we don’t have to worry about them killing people or taking over, because those things are illegal. (Yes, I’ve actually heard this one.) — Various principles of computer science show that it is impossible to build a machine that makes correct choices in all circumstances.  (This is where the “no free lunch“ theorem comes in.  Of course, we’re not proposing a machine that makes correct choices in all circumstances, just one that makes mostly correct choices in the circumstances it encounters.) — There will be lots of AIs, and the good ones will outnumber the bad ones and hence win. — It’s impossible to bu
1David Gould
I am happy to have a conversation with you. On this point: '— The real problem of AI is <something else, usually something already happening>.  You’re distracting people with your farfetched speculation.' I believe that AI indeed poses huge problems, so maybe this is where I sit.  
1Carl Feynman
I tend to concentrate on extinction, as the most massive and terrifying of risks.  I think that smaller problems can be dealt with by the usual methods, like our society has dealt with lots of things.  Which is not to say that they aren’t real problems, that do real harm, and require real solutions.  My disagreement is with “You’re distracting people with your farfetched speculation.”  I don’t think raising questions of existential risk makes it harder to deal with more quotidian problems.  And even if it did, that’s not an argument against the reality of extinction risk.
2xpym
To me the core reason for wide disagreement seems simple enough - at this stage the essential nature of AI existential risk arguments is not scientific but philosophical. The terms are informal and there are no grounded models of underlying dynamics (in contrast with e.g. climate change). Large persistent philosophical disagreements are very much the widespread norm, and thus unsurprising in this particular instance as well, even among experts in currently existing AIs, as it's far from clear how their insights would extrapolate to hypothetical future systems.
2philh
Isn't this kind of thing the default? Like, for ~every invention that changed the world I'd expect to be able to find experts saying in advance that it won't work or if it does it won't change things much. And for lots of things that didn't work or didn't change the world, I'd expect to be able to find experts saying it would. I basically just think that "smart person believes silly thing for silly reasons" is pretty common.
3David Gould
True. Unless there were very good arguments/very good evidence for one side or the other. My expectation is that for any random hypothesis there will be lots of disagreement about it among experts. For a random hypothesis with lots of good arguments/good evidence, I would expect much, much less disagreement among experts in the field. If we look at climate change, for example, the vast majority of experts agreed about it quite early on - within 15 years of the Charney report. If all I am left with, however, is 'smart person believes silly thing for silly reasons' then it is not reasonable for me as a lay person to determine which is the silly thing. Is 'AI poses no (or extremely low) x-risk' the silly thing, or is 'AI poses unacceptable x-risk' the silly thing? If AI does indeed pose unacceptable x-risk and there are good arguments/good evidence for this, then there also has to be a good reason or set of reasons why many experts are not convinced. (Yann claims, for example, that the AI experts arguing for AI x-risk are a very small minority and Eliezer Yudkowsky seems to agree with this).  
2philh
So I don't know much about timelines of global warming or global warming science, but I note that that report came out in 1979, more than 100 years after the industrial revolution. So it's not clear to me that fifteen years after that counts as "quite early on", or that AI science is currently at a comparable point in the timeline. (If points in these timelines can even be compared.) FWIW I think even relatively-lay people can often detect silly arguments, even from people who know a lot more than them. Some examples where I think I've done that: * I remember seeing someone (possibly even Yann LeCun?) saying something along the lines of, AGI is impossible because of no free lunch theorems. * Someone saying that HPMOR's "you violated conservation of energy!" bit is dumb because something something quantum stuff that I didn't understand; and also because if turning into a cat violated conservation of energy, then so did levitating someone a few paragraphs earlier. I am confident this person (who went by the handle su3su2u1) knows a lot more about physics than me. I am also confident this second part was them being silly. * This comment. So I'd suggest that you might be underestimating yourself. But if you're right that you can't reasonably figure this out... I'm not sure there are any ways to get around that? Eliezer can say "Yann believes this because of optimism bias" and Yann can say "Eliezer believes this because of availability heuristic" or whatever, and maybe one or both of them is right (tbc I have not observed either of them saying these things). But these are both Bulverism. It may be that Eliezer and Yann can find a double crux, something where they agree: "Eliezer believes X, and if Eliezer believed not-X then Eliezer would think AGI does not pose a serious risk. Yann believes not-X, and if Yann believed X then Yann would think AGI does pose a serious risk." But finding such Xs is hard, I don't expect there to be a simple one, and even if there was
1David Gould
Re timelines for climate change, in the 1970s, serious people in the field of climate studies started suggesting that there was a serious problem looming. A very short time later, the entire field was convinced by the evidence and argument for that serious risk - to the point that the IPCC was established in 1988 by the UN. When did some serious AI researchers start to suggest that there was a serious problem looming? I think in the 2000s. There is no IPAIX-risk. And, yes: I can detect silly arguments in a reasonable number of cases. But I have not been able to do so in this case as yet (in the aggregate). It seems that there are possibly good arguments on both sides.   It is indeed tricky - I also mentioned that it could get into a regress-like situation. But I think that if people like me are to be convinced it might be worth the attempt. As you say, there may be a more accessible to me domain in there somewhere. Re the numbers, Eliezer seems to claim that the majority of AI researchers believe in X-risk, but few are speaking out for a variety of reasons. This boils down to me trusting Eliezer's word about the majority belief, because that majority is not speaking out. He may be motivated to lie in this case - note that I am not saying that he is, but 'lying for Jesus' (for example) is a relatively common thing. It is also possible that he is not lying but is wrong - he may have talked to a sample that was biased in some way.  
2philh
Nod. But then, I assume by the 1970s there was already observable evidence of warming? Whereas the observable evidence of AI X-risk in the 2000s seems slim. Like I expect I could tell a story for global warming along the lines of "some people produced a graph with a trend line, and some people came up with theories to explain it", and for AI X-risk I don't think we have graphs or trend lines of the same quality. This isn't particularly a crux for me btw. But like, there are similarities and differences between these two things, and pointing out the similarities doesn't really make me expect that looking at one will tell us much about the other. Not opposed to trying, but like... So I think it's basically just good to try to explain things more clearly and to try to get to the roots of disagreements. There are lots of ways this can look like. We can imagine a conversation between Eliezer and Yann, or people who respectively agree with them. We can imagine someone currently unconvinced having individual conversations with each side. We can imagine discussions playing out through essays written over the course of months. We can imagine FAQs written by each side which give their answers to the common objections raised by the other. I like all these things. And maybe in the process of doing these things we eventually find a "they disagree because ..." that helps it click for you or for others. What I'm skeptical about is trying to explain the disagreement rather than discover it. That is, I think "asking Eliezer to explain what's wrong with Yann's arguments" works better than "asking Eliezer to explain why Yann disagrees with him". I think answers I expect to the second question basically just consist of "answers I expect to the first question" plus "Bulverism". (Um, having written all that I realize that you might just have been thinking of the same things I like, and describing them in a way that I wouldn't.)
1Rusins
Unfortunately I do not know the reasoning behind why the people you mentioned might not see AI as a threat, but if I had to guess – people not worried are primarily thinking about short term AI safety risks like disinformation from deepfakes, and people worried are thinking about super-intelligent AGI and instrumental convergence, which necessitates solving the alignment problem.

The presumption here is that civilisation is run by governments are chaotic and low competence. If this is true, there is clearly a problem implementing an AI lockdown policy. It would be great to identify the sort of political or economic steps needed to execute the shutdown.

[-]Roko20

Title changed from

"Architects of Our Own Demise: We Should Stop Developing AI"

to

"Architects of Our Own Demise: We Should Stop Developing AI Carelessly"

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

Global compliance is the sine qua non of regulatory approaches, and there is no evidence of the political will to make that happen being within our possible futures unless some catastrophic but survivable casus belli happens to wake the population up - as with Frank Herbert's Butlerian Jihad (irrelevant aside; Samuel Butler who wrote of the dangers of machine evolution and supremacy lived at film location for Eddoras in Lord of the Rings films in the 19th century).  

Is it insane to think that a limited nuclear conflict (as seems to be an increasingly ... (read more)

5Roko
Part of why I am posting this is in case that happens, so people are clear what side I am on.
2Vaniver
Popular support is already >70% for stopping development of AI. Why think that's not enough, and that populations aren't already awake?
3Nathan Helm-Burger
Well, my model says that what really matters is the opinions of the power-wielding decision makers, and that 'popular opinion' doesn't actually carry much weight in deciding what the US government does. Much less the Chinese government, or the leadership of large corporations. So my view is that it is the decision-makers currently imagining that the poisoned banana will grant them increased wealth & power who need their minds changed. 
5Vaniver
My current sense is that efforts to reach the poisoned banana are mostly not driven by politicians. It's not like Joe Biden or Xi Jinping are pushing for AGI, and even Putin's comments on AI look like near-term surveillance / military stuff, not automated science and engineering.
3Nathan Helm-Burger
Yeah, I agree that that's what the current situation looks like. More tech CEOs making key decisions than politicians. However, I think the strategic landscape may change quite quickly once real world effects become more apparent. In either case, I think it's the set of decision makers holding the reins (whoever that may consist of) who need to be updated. I'm pretty sure that the 'American Public' or 'European Public' could have an influence, but probably not at the level of simply answering 'AI is scary' on a poll. Probably there'd need to be like, widespread riots.
0akarlin
It's not at all insane IMO. If AGI is "dangerous" x timelines are "short" x anthropic reasoning is valid... ... Then WW3 will probably happen "soon" (2020s). https://twitter.com/powerfultakes/status/1713451023610634348 I'll develop this into a post soonish.
2Nathan Helm-Burger
I'm hopeful that the politicians of the various nations who might initiate this conflict can see how badly that would turn out for them personally, and thus find sufficient excuses to avoid rushing into that scenario. Not certain by any means, but hopeful. There certainly will need to be some tense negotiations, at the least.