This might sound either flippant or incendiary, but I mean it sincerely: Wouldn't creating a powerful enough enforcement regime to permanently, reliably guarantee no AGI development necessitate both the society implementing that regime being far more stable over future history than any state has thus far been, and more importantly introduce incredible risk of creating societies that most liberal democracies would find sub-optimal (to put it mildly) that are then locked-in even without AGI due to the aforementioned hyper-stability?
This plan seems likely to sacrifice most future value itself, unless the decision-making humans in charge of the power of the enforcement regime act purely altruistically.
First, I don't propose 'no AGI development'. If companies can create safe and beneficial AGIs (burden of proof is on them), I see no reason to stop them. On the contrary, I think it might be great! As I wrote in my post, this could e.g. increase economic growth, cure disease, etc. I'm just saying that I think that existential risk reduction, as opposed to creating economic value, will not (primarily) originate from alignment, but from regulation.
Second, the regulation that I think has the biggest chance of keeping us existentially safe will need to be implemented with or without aligned AGI. With aligned AGI (barring a pivotal act), there will be an abundance of unsafe actors who could run the AGI without safety measures (also by mistake). Therefore, the labs themselves propose regulation to keep almost everyone but themselves from building such AGI. The regulation required to do that is almost exactly the same.
Third, I'm really not as negative as you are about what it would take to implement such regulation. I think we'll keep our democracies, our freedom of expression, our planet, everyone we love, and we'll be able to go anywhere we like. Some industries and researchers will not be able to do some things they would have liked to do because of regulation. But that's not at all uncommon. And of course, we won't have AGI as long as it isn't safe. But I think that's a good thing.
I should have been more precise. I'm talking about the kind of organizational capabilities required to physically ensure no AI unauthorized by central authority can be created. Whether aligned AGI exists (and presumably in this case, is loyal is said authority over other factions of society that may become dissatisfied) doesn't need to factor into the conversation much.
That may well be the price of survival, nonetheless I felt I needed to point out the very likely price of going down that route. Whether that price is worth paying to reduce x-risk from p(x) to p(x-y) is up to each person reading this. Again, I'm not trying to be flippant, it's an honest question of how we trade off between these two risks. But we should recognize there are multiple risks.
I'm not so much implying you are negative as not sufficiently negative about prospects for liberalism/democracy/non-lock-in in a world where a regulatory apparatus strong enough to do what you propose exists. Most democratic systems are designed to varying degrees so as to not concentrate power in one actor or group of actors, hence the concept of checks & balances as well as different branches of government; the governments are engineered to rely as little as possible on the good will & altruism of the people in said positions. When this breaks down because of unforeseen avenues for corruption, we see corruption (ala stock portfolio returns for sitting senators).
The assumption that we cannot rely on societal decision-makers to not immediately use any power given to them in selfish/despotic ways is what people mean when they talk about humility in democratic governance. I can't see how this humility can continue to occur with the kind of surveillance power alone that would be required both to prevent rebellion over centuries to millenia, much less global/extraglobal enforcement capabilities a regulatory regime would need.
Maybe you have an idea for an enforcement mechanism that could prevent unaligned AGI indefinitely that is nonetheless incapable of being utilized for non-AI regulation purposes (say, stifling dissidents or redistributing resources to oneself), but I don't understand what that institutional design would look like.
I think the assumption that safe, aligned AI can't defend against a later introduction of misaligned AI is false, or rather depends on the assumption of profound alignment failures so that the 'aligned AI' really isn't. AI that is aligned enough to do AI research and operate industry and security forces can expand its capabilities to the technological frontier and grow an industrial base claiming unclaimed resources in space. Then any later AI introduced faces an insurmountable balance of capabilities just from the gap in resources, even if it catches up technologically. That would not violate the sovereignty of any state, although it could be seen as a violation of the Outer Space Treaty if not backed by the international community with treaty revision.
Advanced AI-enabled tech and industry can block bioweapons completely through physical barriers, detection, and sterilization. Vast wealth can find with high probability any zero-days that could be discovered with tiny wealth, and produce ultra-secure systems, so cyberattacks do not produce a vulnerable world. Even nuclear weapons lose their MAD element in the face of millions of drones/interceptors/defenses for each attempted attack (and humans can move to a distance in space, back up their minds, etc).
If it turns out there is something like the ability to create a vacuum collapse that enables one small actor to destroy a much larger AI-empowered civilization, then the vast civilization will find out first, and could safely enforce a ban if a negotiated AI-enforced treaty could not be struck.
If I understand correctly memes about pivotal acts to stop anyone from making misaligned AI stem from the view that we won't be able to make AI that could be trusted to undergo intelligence explosion and industrial expansion for a long time after AI could enable some other 'pivotal act.' I.e. the necessity for enforcing a ban even after AGI development is essentially entirely about failures of technical alignment.
Furthermore, the biggest barrier to extreme regulatory measures like a ban is doubt (both reasonable and unreasonable) about the magnitude of misalignment risk, so research that studies and demonstrates high risk (if it is present) is perhaps the most leveraged possible tool to change the regulatory/governmental situation.
Thank you for writing this reply. It definitely improved my overview of possible ways to look at this issue.
I guess your position can be summarized as "positive offense/defense balance will emerge soon, and aligned AI can block following unaligned AIs entirely if required", is that roughly correct?
I have a few remarks about your ideas (not really a complete response).
The necessity for enforcing a ban even after AGI development is essentially entirely about failures of technical alignment.
First, in general, I think you're underestimating the human component of alignment. Aligned AI should be aligned to something, namely humans. That means it won't be able to build an industrial base in space until we're ready to make it do that.
Even if we are not harmed by such a base in any way, and even if it would be legal to build it, I expect we may not be ready for it for a long time. It will be dead scary to see something develop that seems more powerful than us, but also deeply alien to us, even if tech companies insist it's 'aligned to our values'. Most people's response will be to rein in its power, not expand it further. Any AI that's aligned to us will need to take those feelings seriously.
Even if experts would agree that increasing the power of the aligned AI is good and necessary, and that expansion in space would be required for that, I think it will take a long time to convince the general public and/or decision makers, if it's at all possible. And in any remotely democratic alignment plan, that's a necessary step.
Second, I think it's uncertain whether a level of AI that's powerful enough to take over the world (and thereby cause existential risk) will also be powerful enough to build a large industrial base in space. If not, your plan might not work.
The biggest barrier to extreme regulatory measures like a ban is doubt (both reasonable and unreasonable) about the magnitude of misalignment risk.
I disagree, from my experience of engaging with the public debate, doubt is mostly about AI capability, not about misalignment. Most people easily believe AI to be misaligned to them, but they have trouble believing it will be powerful enough to take over the world any time soon. I don't think alignment research will do that much here.
I disagree, from my experience of engaging with the public debate, doubt is mostly about AI capability, not about misalignment. Most people easily believe AI to be misaligned to them, but they have trouble believing it will be powerful enough to take over the world any time soon. I don't think alignment research will do that much here.
I would say that the power of AI will continue to visibly massively expand (although underestimation of further developments will continue to be a big problem), but that will increase both 'fear AI disaster' and 'get AI first' elements. My read is that that the former is in a very difficult position now when its policy recommendations conflict with the latter. I see this in the Congressional hearings and rejection of the pause letter.
Even if experts would agree that increasing the power of the aligned AI is good and necessary, and that expansion in space would be required for that, I think it will take a long time to convince the general public and/or decision makers, if it's at all possible. And in any remotely democratic alignment plan, that's a necessary step.
When that kind of AI is available, it would mean by the same token that such expansion could break down MAD in short order as such explosive growth could give the power to safely disarm international rivals if not matched or stopped. And AI systems and developers will be able to demonstrate this. So the options would be verifying/trusting deals with geopolitical and ideological rivals to hold back or doing fast AI/industrial expansion. If dealmaking fails, then all options would look scary and abrupt.
I think the "pivotal act" notion was always borderline insanity. First, it requires the kind of super intelligent, nigh-omnipotent and infallible ASI that only results from a proper aligned FOOM. We don't even know if such a thing is possible (my money is on "no"). Second, just the idea of it is already fundamentally unacceptable.
"What is your lab doing?"
"Building AGI. We're trying to align it to our interests and make sure it performs a pivotal act first thing when it comes online."
"Uhm, a pivotal act? What's that?"
"It means the AGI will do something big and powerful to stop any dangerous AGIs from being created ever again."
"...wait, something? What will it do?"
"We don't know, whatever it thinks is best and has the highest chance of working according to the values we are writing into it. Something. Hopefully we might not even notice what. Maybe we will. Might decide to destroy all computers I guess if it looks bad."
"So you're saying the AGI will do something, which might be something very disruptive, but you don't know what that is? How do you even know that the AGI will not just... be wrong, and make a mess, or let a dangerous AGI rise up anyway?"
"Haha, well, it will surely not be as wrong as we would be. Anyway, mostly, we just hope it works out."
That's not really a viable policy to hold, and not something anyone (especially any politician) would endorse. Just giving the newborn AGI immense power over us, right off the bat, with no guarantee of how it will be used. The pivotal act as an idea IMO belongs in a long list of desperate attempts to reconcile the notion of an AGI existing with the notion of humans still surviving and experiencing a utopia, when that seems just less and less likely.
This is a pretty bad misunderstanding of the pivotal act concept and definitions.
The whole reason why Eliezer talked about pivotal acts is in order to force people to be clear about how exactly they want to use an early AGI in order to end the acute risk period. This means it's trying to explicitly contrast itself to other alignment approaches like OpenAI's superalignment plan or Paul's ELK proposals, which end up deferring to the AI on what the right way to actually end further AI risk is.
Why is it a misunderstanding, though? Eliezer has said multiple times that he doesn't know what a good pivotal act would be. And we all know that Eliezer does believe in a fast take-off and huge gains for superintelligence, therefore I wouldn't say it's that weird for him to think that if we do have an aligned AGI it can quickly reach the point where it can just disseminate nanomachines that sabotage your GPUs if you're trying to make another AGI or any other such thing.
My point about it being insanity though was, besides the fact that I don't agree with Eliezer on the credibility of those take-off scenarios (I think in fact sadly the ASI would stay much longer, possibly forever, in the "smart enough to kill us, not smart enough to save us" window), but the political feasibility and even morality of them. It's still an incredibly unilateral act of essentially compulsion on all of humanity; for a very limited purpose, true, but still somehow aggressive (and in practice, I don't think anyone would actually stop at just doing that, if they had that sort of power). I'm looking at this in terms of actual half realistic scenarios in which someone, not Yud himself, actually gets to build an AGI and gets to decide what values to put it in, and other people know they're doing this, and so on so forth. And those worlds, IMO, don't just let it happen, because right or wrong, most people and organizations don't like the idea of someone making that sort of decision for them.
Why is it a misunderstanding, though?
I mean, because it asserts that the same people who advocate for thinking about pivotal acts, and who popularized the pivotal act notion would say anything like "We don't know, whatever it thinks is best and has the highest chance of working according to the values we are writing into it.".
This is explicitly not what Nate and Eliezer and some MIRI people are trying to do. The whole point of a minimum pivotal act is to make it so that you don't have to align your AI all the way so that you just have it go off and do whatever is best according to the values we programmed into it. It's so that you have as close as possible to a concrete plan of what you want to do with the AI, planning for the world where you didn't fully solve the AI Alignment problem and can just fully defer to the AI.
I think, Eliezer said multiple times that he has pretty good idea about what is a minimal efficient pivotal act, he just can't name it out loud because it's way out of Overton window, so he keep referring to "something-like-melting-GPUs-but-obviously-not-that"?
Ok so he does admit it's something completely politically unviable because it's probably tyrannical or straight up lesser-evil-but-still-pretty-evil. At which point I'm not even sure if not saying it out loud doesn't make it sound even more ominous. Point stands, "pivotal act" can't possibly be a viable strategy and in fact its ethical soundness altogether is questionable unless it's really just a forced binary choice between that and extinction.
"Outside Overton window" "evil". Like, "let's defer to prediction markets in major policy choices" was pretty out of it most of history and probably even today.
As far as I remember, "melting all GPUs" is not an actual pivotal act because it is not minimal: it's too hard to align ASI to build nanobots for this and operate in environment safely. And I think we can conclude that actual PA should be pretty tame, because, sure, melting all GPUs is scary and major property destruction, but it's nothing close to "establishing mind-controlling surveillance dictatorship".
Another example of possible PA is invention of superhuman intelligence enhancement, but it's still not minimal.
"Outside Overton window"≠ "evil"
True, but would you really be ashamed of saying "let's defer to prediction markets in major policy choices" out loud? That might get you some laughs and wouldn't be taken very seriously but most people wouldn't be outright outraged.
And I think we can conclude that actual PA should be pretty tame, because, sure, melting all GPUs is scary and major property destruction, but it's nothing close to "establishing mind-controlling surveillance dictatorship".
True to a point, but it's still something that people would strongly object to - since you can't even prove the counterfactual that without it we'd be all dead. And in addition, there is a more serious aspect to it, which is military hardware that uses GPUs. And technically destroying that in other countries is an act of war or at least sabotage.
Another example of possible PA is invention of superhuman intelligence enhancement, but it's still not minimal.
I doubt that would solve anything. Intelligence does not equal wisdom; some fool would still probably just use it to build AGI faster.
If you can prove that it's you who melt all GPUs stealthy using AI-developed nanotech, it should be pretty obvious that the same AI without safety measures can kill everyone.
Scott Alexander once wrote that while it's probably not wise to build AI organisation around pivotal act, if you find yourself in position where you can do it, you should do it, because, assuming you are not special genius decades ahead in AI development, if you can do pivotal act, someone else in AI can kill everyone.
I mean intelligence in wide sense, including wisdom, security mindset and self-control. And obviously, if I could build AI that can provide me such enhancement, I would enhance myself to solve full value-alignment problem, not give enhancement to random unchecked fools.
Yes, but that "I can't let someone else handle this, I'll do it myself behind their backs" generalized attitude is how actually we do get 100% all offed, no pivotal acts whatsoever. It's delusion to think it leaves a measurable, non-infinitesimal window to actually succeeding - it does not. It simply leads to everyone racing and eventually someone who's more reckless and thus faster "winning". Or at best, it leads to a pivotal act by someone who then absolutely goes on to abuse their newfound power because no one can be inherently trusted with that level of control. That's the best of the two worlds, but still bad.
Not quite. If you live in the world where you can let others handle this, you can't be in position to perform pivotal act, because others will successfully coordinate around not giving anyone (including you) unilateral capability to launch ASI. And otherwise, if you find yourself in situation "there is a red button to melt all GPUs", it means that others utterly failed to coordinate and you should pick the least bad world that remains possible.
Eliezer has said multiple times that he doesn't know what a good pivotal act would be.
I don't think that's true; can you find an example?
Eliezer has not publicly endorsed a specific concrete pivotal act, AFAIK, but that's different.
Ah, fair, I might be mixing the two things. But let's put it this way - if "melt all GPUs" is the pivotal act example Eliezer keeps going back to, and he has a secret one that he knows but doesn't say out loud, is it because it's some kind of infohazard that risks failing if spelled out, or is it because it's so bad he knows it's better if he doesn't say it?
I don't disagree. But I do think people dismissing the pivotal act should come up with an alternative plan that they believe is more likely to work. Because the problem is still there: "how can we make sure that no-one, ever builds an unaligned superintelligence?" My alternative plan is regulation.
Oh, yes, I agree. Honestly I find it all bleak. The kind of regulation needed to prevent this sounds like may be either insufficient, or quite stifling. This is like having to deal with nuclear proliferation, but if the laws of nature allowed everyone to make an atomic pipe bomb by using rocks you can pick up from the ground. So it's either that, or risking it and hoping that somehow it all turns out well - there are possible vulnerabilities in AI risk arguments, but personally I don't find them all that compelling, or something I'd be willing to bet my life on, let alone everyone's. I just think that the AI risk discourse is quite weighed down by the fact that many people just don't want to let the hope of seeing the singularity in their lifetimes go, which prevents them from going all the way to the logical conclusion that we just shouldn't build AGI, and find ways to prevent it all around, hard as it is.
This is like having to deal with nuclear proliferation, but if the laws of nature allowed everyone to make an atomic pipe bomb by using rocks you can pick up from the ground.
This is hiding a lot of work, and if it's interpreted as the most extreme statement possible, I think this is at best maybe true, and maybe simply false.
And even if it is true, it's not going to be exploited immediately, and there will be lag time that matters.
Also importantly, LLMs probably aren't going to scale to existential risk quickly unless our world is extremely vulnerable, due to pretty big issues with how they reason, so that adds additional lag time.
A basic disagreement I have with this post and many rationalist worldviews, including you dr_s's worldview here is that I believe this statement in this post is either simply false, true but has more limitations than rationalists think, or it's true but it takes a lot longer to materialize than people here think, which is important since we can probably regulate things pretty well as long as the threat isn't too fast coming:
My personal bet, however, is that offense will unfortunately trump defense.
Ok, so I may have come off too pessimistic there. Realistically I don't think AGI will actually be something you can achieve on your gaming laptop in a few days of training just yet, or any time soon. So maybe my metaphor should have been different, but it's hard to give the right sense of scale. The Manhattan project required quite literally all of the industrial might of the US. This is definitely smaller, though perhaps not do-it-in-my-basement smaller. I do generally agree that there are things we can do - and at the very least they're worth trying! That said, I still think that even the things that work are kind of too restrictive for my tastes, and I'm also worried that of course as it always happen they'll lead to politicians overreaching. My ideal world would be one in which big AI labs get stifled on creating AGI specifically, specialised AI is left untouched, open source software for lesser applications is left untouched, and maybe we only monitor large scale GPU hoarding. But I doubt it'd be that simple. So that's what I find bleak - that we're forced into a choice between risk of extinction or risk of oppression whereas we wouldn't have to if people didn't insist trying to open this specific Pandora's box.
That's definitely progress. I think that the best thing AI regulation can do right now is looking to the future, and in particular getting prepared with draft plans for AI regulation, so that if or when the next crisis hits, we won't be fumbling for solutions and instead have good AI regulations back in the running.
Agree that those drafts are very important. I also think there will be technical research required in order to find out which regulation would actually be sufficient (I think at present we have no idea). I disagree, however, that waiting for a crisis (warning shot) is a good plan. There might not really be one. If there would be one, though, I agree that we should at least be ready.
True that we probably shouldn't wait for a crisis, but one thing that does stand out to me is that the biggest issue wasn't political will, but rather that AI governance was pretty unpreprared for this moment (though they improvised surprisingly effectively).
In this comment, I will be assuming that you intended to talk of "pivotal acts" in the standard (distribution of) sense(s) people use the term — if your comment is better described as using a different definition of "pivotal act", including when "pivotal act" is used by the people in the dialogue you present, then my present comment applies less.
I think that this is a significant mischaracterization of what most (? or definitely at least a substantial fraction of) pivotal activists mean by "pivotal act" (in particular, I think this is a significant mischaracterization of what Yudkowsky has in mind). (I think the original post also uses the term "pivotal act" in a somewhat non-standard way in a similar direction, but to a much lesser degree.) Specifically, I think it is false that the primary kinds of plans this fraction of people have in mind when talking about pivotal acts involve creating a superintelligent nigh-omnipotent infallible FOOMed properly aligned ASI. Instead, the kind of person I have in mind is very interested in coming up with pivotal acts that do not use a general superintelligence, often looking for pivotal acts that use a narrow superintelligence (for instance, a narrow nanoengineer) (though this is also often considered very difficult by such people (which is one of the reasons they're often so doomy)). See, for instance, the discussion of pivotal acts in https://www.lesswrong.com/posts/7im8at9PmhbT4JHsW/ngo-and-yudkowsky-on-alignment-difficulty.
I don't think this revolutionises my argument. First, there's a lot of talking about example possible pivotal acts and they're mostly just not that believable on their own. The typical "melt all GPUs" is obviously incredibly hostile and disruptive, but yes, of course, it's only an example. The problem is that without an actual outline for what a perfect pivotal act is, you can't even hope to do it with "just" a narrow superintelligence, because in that case, you need to work out the details yourself, and the details are likely horribly complicated.
But the core, fundamental problem with the "pivotal act" notion is that it tries to turn a political problem into a technological one. "Do not build AGIs" is fundamentally a political problem: it's about restricting human freedom. Now you can either do that voluntarily, by consensus, with some enforcement mechanism for the majority to impose its will on the minority, or you can do that by force, with a minority using overwhelming power to make the majority go along even against their will. That's it. A pivotal act is just a nice name for the latter thing. The essence of the notion is "we can't get everyone on board quickly enough; therefore, we should just build some kind of superweapon that allows us to stop everyone else from building unsafe AGI as we define it, whether they like it or not". It's not a lethal weapon, and you can argue the utilitarian trade-off from your viewpoint is quite good, but it is undeniably a weapon. And therefore it's just not something that can be politically acceptable because people don't like to have weapons pointed at them, not even when the person making the weapon assures them it's for their own good. If "pivotal act" became the main paradigm the race dynamics would only intensify because then everyone knows they'll only have one shot and they won't trust the others to either get it right or actually limit themselves to just the pivotal act once they're the only ones with AI power in the world. And if instead the world came together to agree on a pivotal act... well that's just regulation, first, as described in this post. And then moving on to develop a kind of special nanobot police to enforce that regulation (which would still be a highly controversial action, and if deployed worldwide, essentially an act of war against any country not subscribing to the AI safety treaty or whatever).
I was just claiming that your description of pivotal acts / of people that support pivotal acts was incorrect in a way that people that think pivotal acts are worth considering would consider very significant and in a way that significantly reduces the power of your argument as applying to what people mean by pivotal acts — I don't see anything in your comment as a response to that claim. I would like it to be a separate discussion whether pivotal acts are a good idea with this in mind.
Now, in this separate discussion: I agree that executing a pivotal act with just a narrow, safe, superintelligence is a difficult problem. That said, all paths to a state of safety from AGI that I can think of seem to contain difficult steps, so I think a more fine-grained analysis of the difficulty of various steps would be needed. I broadly agree with your description of the political character of pivotal acts, but I disagree with what you claim about associated race dynamics — it seems plausible to me that if pivotal acts became the main paradigm, then we'd have a world in which a majority of relevant people are willing to cooperate / do not want to race that much against others in the majority, and it'd mostly be a race between this group and e/acc types. I would also add, though, that the kinds of governance solutions/mechanisms I can think of that are sufficient to (for instance) make it impossible to perform distributed training runs on consumer devices also seem quite authoritarian.
it seems plausible to me that if pivotal acts became the main paradigm, then we'd have a world in which a majority of relevant people are willing to cooperate / do not want to race that much against others in the majority, and it'd mostly be a race between this group and e/acc types
I disagree, I think in many ways the current race already seems motivated by something of the sort - "if I don't get to it first, they will, and they're sure to fuck it up". Though with no apparent planning for pivotal acts in sight (but who knows).
I would also add, though, that the kinds of governance solutions/mechanisms I can think of that are sufficient to (for instance) make it impossible to perform distributed training runs on consumer devices also seem quite authoritarian.
Oh, agreed. It's a choice between shitty options all around.
The issue I have with pivotal act models is that they presume an aligned superintelligence would be capable of bootstrapping its capabilities in such a way that it could perform that act before the creation of the next superintelligence. Soft takeoff seems a very popular opinion now, and isn't conducive to this kind of scheme.
Also, if a large org were planning a pivotal act I highly doubt they would do so publicly. I imagine subtly modifying every GPU on the planet, melting them or doing anything pivotal on a planetary scale such that the resulting world has only one or a select few superintelligences (at least until a better solution exists) would be very unpopular with the public and with any government.
I don't think the post explicity argues against either of these points, and I agree with what you have written. I think these are useful things to bring up in such a discussion however.
Interesting discussion.
I suspect that an aligned super-intelligence could figure out what to do and persuade people to do it, except if people automatically disregard what it says because they know they’d be persuaded whether it was aligned or not.
So would people be persuaded or would they disregard it? My bet would be that it persuades people to do what’s necessary given sufficient lead time.
For this reason I suspect alignment could constitute a solution without us having to ask it to do anything too drastic.
The problem I see with these arguments is that they all make the AI to be superintelligent already, but that's not a given? Or rather, I think doing this reliably requires far more intelligence than just doing something destructive. Defending requires more smarts than attacking, if it's even possible.
Interesting take! Wouldn't that go under "Types of AI (hardware) regulation may be possible where the state actors implementing the regulation are aided by aligned AIs"?
This is a serious problem, but it is under active investigation at the moment, and the binary of regulation or pivotal act is a false dichotomy. Most approaches that I've heard of rely on some combination of positively transformative AI tech (basically lots of TAI technologies that reduce risks bit by bit, overall adding up to an equivalent of a pivotal act) and regulation to give time for the technologies to be used to strengthen the regulatory regime in various ways or improve the balance of defense over offense, until eventually we transition to a totally secure future: though of course this assumes at least (somewhat) slow takeoff.
You can see these interventions as acting on the conditional probabilities 4) and 5) in our model by driving down the chance that assuming misaligned APS is deployed, it can cause large-scale disasters.
4) Misaligned APS systems will be capable of causing a large global catastrophe upon deployment,
5) The human response to misaligned APS systems causing such a catastrophe will not be sufficient to prevent it from taking over completely,
6) Having taken over, the misaligned APS system will destroy or severely curtail the potential of humanity.
This hasn't been laid out in lots of realistic detail yet not least because most AI governance people are currently focused on near-term actions like making sure the regulations are actually effective, because that's the most urgent task. But this doesn't reflect a belief that regulations alone are enough to keep us safe indefinitely.
Holden Karnofsky has written on this problem extensively,
Very nice article.You should be aware that many of the institutional/academic proponents of heavy AI regulation are incredibly corrupt and incompetent.See this for instance:https://rwincblog9.wordpress.com/2023/06/15/11/
Thanks for the compliment. Not convinced though that this single example, assuming it's correct, generalizes
Summary: Aligning a single powerful AI is not enough: we're only safe if no-one, ever, can build an unaligned powerful AI. Yudkowsky tried to solve this with the pivotal act: the first aligned AI does something (such as melting all GPUs) which makes sure no unaligned AIs can ever get built, by anyone. However, the labs are currently apparently not aiming to implement a pivotal act. That means that aligning an AGI, while creating lots of value, would not reduce existential risk. Instead, global hardware/data regulation is what's needed to reduce existential risk. Therefore, those aiming to reduce AI existential risk should focus on AI Regulation, rather than on AI Alignment.
Epistemic status: I’ve been thinking about this for a few years, while working professionally on x-risk reduction. I think I know most literature on the topic. I have also discussed the topic with a fair number of experts (who in some cases seemed to agree, and in other cases did not seem to agree).
Thanks to David Krueger, Matthijs Maas, Roman Yampolskiy, Tim Bakker, Ruben Dieleman, and Alex van der Meer for helpful conversations, comments, and/or feedback. These people do not necessarily share the views expressed in this post.
This post is mostly about AI x-risk caused by a take-over. It may or may not be valid for other types of AI x-risks. This post is mostly about the ‘end game’ of AI existential risk, not about intermediate states.
AI existential risk is an evolutionary problem. As Eliezer Yudkowsky and others have pointed out: even if there are safe AIs, those are irrelevant, since they will not prevent others from building dangerous AIs. Examples of safe AIs could be oracles or satisficers, insofar as it turns out to be possible to combine these AI types with high intelligence. But, as Yudkowsky would put it: “if all you need is an object that doesn't do dangerous things, you could try a sponge”. Even if a limited AI would be a safe AI, it would not reduce AI existential risk. This is because at some point, someone would create an AI with an unbounded goal (create as many paperclips as possible, predict the next word in the sentence with unlimited accuracy, etc.). This is the AI that would kill us, not the safe one.
This is the evolutionary nature of the AI existential risk problem. It is described excellently by Anthony Berglas in his underrated book, and more recently also in Dan Hendrycks’ paper. This evolutionary part is a fundamental and very important property of AI existential risk and a large part of why this problem is difficult. Yet, many in AI Alignment and industry seem to focus on only aligning a single AI, which I think is insufficient.
Yudkowsky aimed to solve this evolutionary problem (the fact that no-one, ever, should build an unsafe AI) with the so-called pivotal act. An aligned superintelligence would not only not kill humanity, it would also perform a pivotal act, the toy example being to melt all GPUs globally, or, as he later put it, to subtly change all GPUs globally so that they can no longer be used to create an AGI. This would be the act that would actually save humanity from extinction, by making sure no unsafe superintelligences are created, ever, by anyone (it may be argued that melting all GPUs, and all other future hardware that could run AI, would need to be done indefinitely by the aligned superintelligence, else even a pivotal act may be insufficient).
The concept of a pivotal act, however, seems to have gone thoroughly out of fashion. None of the leading labs, AI governance think tanks, governments, etc. are talking or, apparently, thinking much about it. Rather, they seem to be thinking about things like non-proliferation and several types of regulation, to make sure powerful AI won't fall into the wrong hands. This could mean anyone who could run it, either on purpose or by mistake, without safety measures. I would call such a solution, specifically any solution that has the capacity to limit any actor’s access to advanced AI for any period of time, AI Regulation.
This solution, which appears to have gotten mainstream, has important consequences:
Therefore, without a pivotal act, what keeps us safe is regulation. One might still want to align a superintelligence to use its power, but not to prevent existential risk. Using a superintelligence’s power may of course be a valid reason to pursue alignment: it could skyrocket our economy, create abundance, cure disease, increase political power, etc. Although net positivity of these enormous, and enormously complex, transformations may be hard to prove in advance, these could certainly be legitimate reasons to work on alignment. However, those of us interested in preventing existential risk, as opposed to building AI, should - in this scenario - be focusing on regulation, not on alignment. The latter might also be left to industry, as well as the burden of proof that the resulting aligned AIs are indeed safe.
Moving beyond this scenario of AI Regulation, there is one more option to solve the full evolutionary problem of AI existential risk. Some think that aligned superintelligences could successfully and indefinitely protect us from unaligned superintelligences. This option, which I would call a positive offense/defense balance, would be a third way, next to alignment + pivotal act and lasting regulation, to prevent human extinction in the longer term. Most people do not seem to think that this would be realistic, however (with notable exceptions).
These three ways of solving the evolutionary nature of AI existential risk (AI alignment + pivotal act, AI regulation, defense > offense) might not be the complete set of solutions for the evolutionary problem of AI existential risk, and there are intersections between the three. The pivotal act might be seen as a (very restrictive, and illegal) type of winning the offense/defense balance. A pivotal act carried out by a state actor might be seen as an extreme (and again illegal) way of implementing AI regulation. Types of AI (hardware) regulation may be possible where the state actors implementing the regulation are aided by aligned AIs, making their implementation somewhat similar to a pivotal act (that would in this case probably be legal). And certain types of regulation can perhaps make it more likely that we win the offense/defense balance.
I think research should be carried out that aims for a complete set of solutions to the evolutionary problem of AI existential risk. I would expect such research to come up with more options than these three, and/or with more hybrid options in between these three, which may point to new, fruitful ways of reducing AI existential risk.
As long as we assume that only three solutions exist to the evolutionary nature of AI existential risk, it is important to realize that all three seem difficult. Also, it is hard to quantify the likeliness of each option. Therefore, placing bets on any of these three could be worthwhile.
My personal bet, however, is that offense will unfortunately trump defense, and that the chance that alignment will be solved before a superintelligence with takeover capabilities is developed and this aligned superintelligence will carry out a successful pivotal act, is smaller than the chance that we will be able to coordinate successfully and implement good enough hardware or data regulation, especially if the current trend of increasing public awareness of AI existential risk continues. This implies that working on regulation of the type that could globally and indefinitely limit access to advanced AI for all actors and for as long as necessary, should be the highest existential priority, more so than working on alignment.