tl;dr: I know a bunch of EA/rationality-adjacent people who argue — sometimes jokingly and sometimes seriously — that the only way or best way to reduce existential risk is to enable an “aligned” AGI development team to forcibly (even if nonviolently) shut down all other AGI projects, using safe AGI.  I find that the arguments for this conclusion are flawed, and that the conclusion itself causes harm to institutions who espouse it.   Fortunately (according to me), successful AI labs do not seem to espouse this "pivotal act" philosophy.

[This post is also available on the EA Forum.]

How to read this post

Please read Part 1 first if you’re very impact-oriented and want to think about the consequences of various institutional policies more than the arguments that lead to the policies; then Parts 2 and 3.

Please read Part 2 first if you mostly want to evaluate policies based on the arguments behind them; then Parts 1 and 3.

I think all parts of this post are worth reading, but depending on who you are, I think you could be quite put off if you read the wrong part first and start feeling like I’m basing my argument too much on kinds-of-thinking that policy arguments should not be based on.

Part 1: Negative Consequences of Pivotal Act Intentions

Imagine it’s 2022 (it is!), and your plan for reducing existential risk is to build or maintain an institution that aims to find a way for you — or someone else you’ll later identify and ally with — to use AGI to forcibly shut down all other AGI projects in the world.  By “forcibly” I mean methods that violate or threaten to violate private property or public communication norms, such as by using an AGI to engage in…

  • cyber sabotage: hacking into competitors’ computer systems and destroy their data;
  • physical sabotage: deploying tiny robotic systems that locate and destroy AI-critical hardware without (directly) harming any humans;
  • social sabotage: auto-generating mass media campaigns to shut down competitor companies by legal means, or
  • threats: demonstrating powerful cyber or physical or social threats, and bargaining with competitors to shut down “or else”.

Hiring people for your pivotal act project is going to be tricky.  You’re going to need people who are willing to take on, or at least tolerate, a highly adversarial stance toward the rest of the world.  I think this is very likely to have a number of bad consequences for your plan to do good, including the following:

  1. (bad external relations)  People on your team will have a low trust and/or adversarial stance towards neighboring institutions and collaborators, and will have a hard time forming good-faith collaboration.  This will alienate other institutions and make them not want to work with you or be supportive of you.
  2. (bad internal relations)  As your team grows, not everyone will know each other very well.  The “us against the world” attitude will be hard to maintain, because there will be an ever weakening sense of “us”, especially as people quit and move to other institutions and conversely.  Sometimes, new hires will express opinions that differ from the dominant institutional narrative, which might pattern-match as “outsidery” or “norm-y” or “too caught up in external politics”, triggering feelings of internal distrust within the team that some people might defect on the plan to forcibly shut down other projects.  This will cause your team to get along poorly internally, and make it hard to manage people.
  3. (risky behavior) In the fortunate-according-to-you event that your team manages to someday wield a powerful technology, there will be a sense of pressure to use it to “finally make a difference” or other argument that boils down to acting quickly before competitors would have a chance to shut you down or at least defend themselves.  This will make it hard to stop your team from doing rash things that would actually increase existential risk.

Overall, building an AGI development team with the intention to carry out a “pivotal act” of the form “forcibly shut down all other A(G)I projects” is probably going to be a rough time, I predict.

Does this mean no institution in the world can have the job of preparing to shut down runaway technologies?  No; see “Part 3: it matters who does things”.

Part 2: Fallacies in Justifying Pivotal Acts

For pivotal acts of the form “shut down all (other) AGI projects”, there’s an argument  that I’ve heard repeatedly from dozens of people, which I claim has easy-to-see flaws if you slow down and visualize the world that the argument is describing.

This is not an argument that successful AI research groups (e.g., OpenAI, DeepMind, Anthropic) seem to espouse.  Nonetheless, I hear the argument frequently enough to want to break it down and refute it.

Here is the argument:

  1. AGI is a dangerous technology that could cause human extinction if not super-carefully aligned with human values.

    (My take: I agree with this point.)
     
  2. If the first group to develop AGI manages to develop safe AGI, but the group allows other AGI projects elsewhere in the world to keep running, then one of those other projects will likely eventually develop unsafe AGI that causes human extinction.

    (My take: I also agree with this point, except that I would bid to replace “the group allows” with “the world allows”, for reasons that will hopefully become clear in Part 3: It Matters Who Does Things.)
     
  3. Therefore, the first group to develop AGI, assuming they manage to align it well enough with their own values that they believe they can safely issue instructions to it, should use their AGI to build offensive capabilities for targeting and destroying the hardware resources of other AGI development groups, e.g., nanotechnology targeting GPUs, drones carrying tiny EMP charges, or similar.

    (My take: I do not agree with this conclusion, I do not agree that (1) and (2) imply it, and I feel relieved that every successful AI research group I talk to is also not convinced by this argument.)

The short reason why (1) and (2) do not imply (3) is that when you have AGI, you don’t have to use the AGI directly to shut down other projects.  

In fact, before you get to AGI, your company will probably develop other surprising capabilities, and you can demonstrate those capabilities to neutral-but-influential outsiders who previously did not believe those capabilities were possible or concerning.  In other words, outsiders can start to help you implement helpful regulatory ideas, rather than you planning to do it all on your own by force at the last minute using a super-powerful AI system.  

To be clear, I’m not arguing for leaving regulatory efforts entirely in the hands of governments with no help or advice or infrastructural contributions from the tech sector.  I’m just saying that there are many viable options for regulating AI technology without requiring one company or lab to do all the work or even make all the judgment calls.

Q: Surely they must be joking or this must be straw-manning... right?

A: I realize that lots of EA/R folks are thinking about AI regulation in a very nuanced and politically measured way, which is great.  And, I don't think the argument (1-3) above represents a majority opinion among the EA/R communities.  Still, some people mean it, and more people joke about it in an ambiguous way that doesn't obviously distinguish them from meaning it:

  • (ambiguous joking) I've numerous times met people at EA/R events who were saying extreme-sounding things like "[AI lab] should just melt all the chip fabs as soon as they get AGI", who when pressed about the extremeness of this idea will respond with something like "Of course I don't actually mean I want [some AI lab] to melt all the chip fabs".  Presumably, some of those people were actually just using hyperbole to make conversations more interesting or exciting or funny.  

    Part of my motivation in writing this post is to help cut down on the amount of ambiguous joking about such proposals.  As the development of more and more advanced AI technologies is becoming a reality, ambiguous joking about such plans has the potential to really freak people out if they don't realize you're exaggerating.
     
  • (meaning it) I have met at least a dozen people who were not joking when advocating for invasive pivotal acts along the lines of the argument (1-3) above.  That is to say, when pressed after saying something like (1-3), their response wasn't "Geez, I was joking", but rather, "Of course AGI labs should shut down other AGI labs; it's the only morally right thing for them to do, given that AGI labs are bad.  And of course they should do it by force, because otherwise it won't get done."

    In most cases, folks with these viewpoints seemed not to have thought about the cultural consequences of AGI research labs harboring such intentions over a period of years (Part 2), or the fallacy of assuming technologists will have to do everything themselves (Part 1), or the future possibility of making evidence available to support global regulatory efforts from a broader base of consensual actors (see Part 3).

    So, part of my motivation in writing this post is as a genuine critique of a genuinely expressed position.

Part 3: It Matters Who Does Things

I think it’s important to separate the following two ideas:

  • Idea A (for “Alright”): Humanity should develop hardware-destroying capabilities — e.g., broadly and rapidly deployable non-nuclear EMPs — to be used in emergencies to shut down potentially-out-of-control AGI situations, such as an AGI that has leaked onto the internet, or an irresponsible nation developing AGI unsafely.
  • Idea B (for “Bad”): AGI development teams should be the ones planning to build the hardware-destroying capabilities in Idea A.

For what it’s worth, I agree with Idea A, but disagree with Idea B:

Why I agree with Idea A

It’s indeed much nicer to shut down runaway AI technologies (if they happen) using hardware-specific interventions than attacks with big splash effects like explosives or brainwashing campaigns.  I think this is the main reason well-intentioned people end up arriving at this idea, and Idea B, but I think Idea B has some serious problems.

Why I disagree with Idea B

A few reasons!  First, there’s:

  • Action Consequence 1: the action of having an AGI carry out or even prescribe such a large intervention on the world — invading others’ private property to destroy their hardware — is risky and legitimately scary.  Invasive behavior is risky and threatening enough as it is; using AGI to do it introduces a whole range of other uncertainties, not least because the AGI could be deceptive or otherwise misaligned with humanity in ways that we don’t understand.

Second, before even reaching the point of taking the action prescribed in Idea B, merely harboring the intention of Idea B has bad consequences; echoing similar concerns as Part 1:

  • Intention Consequence 1: Racing.  Harboring Idea B creates an adversarial winner-takes-all relationship with other AGI companies racing to maintain
    • a degree of control over the future, and
    • the ability to implement their own pet theories on how safety/alignment should work, leading to more desperation, more risk-taking, and less safety overall.
  • Intention Consequence 2: Fear.  Via staff turnover and other channels, harboring Idea B signals to other AGI companies that you are willing to violate their property boundaries to achieve your goals, which will cause them to fear for their physical safety (e.g., because your incursion to invade their hardware might go awry and end up harming them personally as well).  This kind of fear leads to more desperation, more winner-takes-all mentality, more risk-taking, and less safety.

Summary

In Part 1, I argued that there are negative consequences to AGI companies harboring the intention to forcibly shut down other AGI companies.  In Part 2, I analyzed a common argument in favor of that kind of “pivotal act”, and found a pretty simple flaw stemming from fallaciously assuming that the AGI company has to do everything itself (rather than enlisting help from neutral outsiders, using evidence).  In Part 3, I elaborated more on the nuance regarding who (if anyone) should be responsible for developing hardware-shutdown technologies to protect humanity from runaway AI disasters, and why in particular AGI companies should not be the ones planning to do this, mostly echoing points from Part 1.

Fortunately, successful AI labs like DeepMind, OpenAI, and Anthropic do not seem to espouse this “pivotal act” philosophy for doing good in the world.  One of my hopes in writing this post is to help more EA/R folks understand why I agree with their position.





 

New Comment
56 comments, sorted by Click to highlight new comments since:

In fact, before you get to AGI, your company will probably develop other surprising capabilities, and you can demonstrate those capabilities to neutral-but-influential outsiders who previously did not believe those capabilities were possible or concerning.  In other words, outsiders can start to help you implement helpful regulatory ideas...

It is not for lack of regulatory ideas that the world has not banned gain-of-function research.

It is not for lack of demonstration of scary gain-of-function capabilities that the world has not banned gain-of-function research.

What exactly is the model by which some AI organization demonstrating AI capabilities will lead to world governments jointly preventing scary AI from being built, in a world which does not actually ban gain-of-function research?

(And to be clear: I'm not saying that gain-of-function research is a great analogy. Gain-of-function research is a much easier problem, because the problem is much more legible and obvious. People know what plagues look like and why they're scary. In AI, it's the hard-to-notice problems which are the central issue. Also, there's no giant economic incentive for gain-of-function research.)

[-]RaemonΩ17351

Various thoughts that this inspires:

Gain of Function Ban as Practice-Run/Learning for relevant AI Bans

I have heard vague-musings-of-plans in the direction of "get the world to successfully ban Gain of Function research, as a practice-case for getting the world to successfully ban dangerous AI." 

I have vague memories of the actual top bio people around not being too focused on this, because they thought there were easier ways to make progress on biosecurity. (I may be conflating a few different statements – they might have just critiquing a particular strategy I mentioned for banning Gain of Function research)

A few considerations for banning Gain of Function research for AI-related reasons:

  • because you will gain skills / capacities that transfer into banning relevant AI systems. (i.e. "you'll learn what works")
  • you won't learn what works, but you'll hit a bunch of brick walls that teaches you what doesn't work.
  • A gain of function ban is "lower stakes" (biorisk is way less likely to kill everyone than AI), and (hopefully?) won't have many side effects that specifically make it harder to ban AI-stuff later. (By contrast, if you try ineffectually to regulate AI in some way, you will cause the AI industry to raise it's hackles, and maybe cause one political party to get a reputation as "the anti-AI party", causing the other party to become "anti-anti-AI" in response, or maybe you will get everyone's epistemics about what's worth banning all clouded.

Distinction between "Regulating AI is possible/impossible" vs "pivotal act framing is harmful/unharmful".

I currently believe John's point of "man, sure seems real hard to actually usefully regulate things even when it's comparatively easy, I don't feel that hopeful about regulation processes working."

But, that doesn't necessarily contradict the point that it's really hard to build an organization capable of unilaterally implementing a pivotal act, and that the process of doing is likely to create enemies, erode coordination-fabric, make people fearful, etc.

It seems obvious to me that the arguments in this post are true-to-some-degree. There's some actual math / hashing-out that I haven't seen to my satisfaction of how all the arguments actually balance against each toher. 

Something feels off about the way people relate to "everyone else's ideas seeming more impossible than mine own, even if we all agree it's pretty impossible."

+1 to the distinction between "Regulating AI is possible/impossible" vs "pivotal act framing is harmful/unharmful".

I'm sympathetic to a view that says something like "yeah, regulating AI is Hard, but it's also necessary because a unilateral pivotal act would be Bad". (TBC, I'm not saying I agree with that view, but it's at least coherent and not obviously incompatible with how the world actually works.) To properly make that case, one has to argue some combination of:

  • A unilateral pivotal act would be so bad that it's worth accepting a much higher chance of human extinction in order to avoid it, OR
  • Aiming for a unilateral pivotal act would not reduce the chance of human extinction much more than aiming for a multilateral pivotal act

I generally expect people opposed to the pivotal act framing to have the latter in mind rather than the former. The obvious argument that aiming for a unilateral pivotal act does reduce the chance of human extinction much more than aiming for a multilateral pivotal act is that it's much more likely that someone could actually perform a unilateral pivotal act; it is a far easier problem, even after accounting for the problems the OP mentions in Part 1. That, I think, is the main view one would need to argue against in order to make the case for multilateral over unilateral pivotal act as a goal. The OP doesn't really make that case at all; it argues that aiming for unilateral introduces various challenges, but it doesn't even attempt to argue that those challenges would be harder than (or even comparably hard to) getting all the major world governments to jointly implement an actually-effective pivotal act.

John, it seems like you're continuing to make the mistake-according-to-me of analyzing the consequences of a pivotal act without regard for the consequences of the intentions leading up to the act.  The act can't come out of a vacuum, and you can't built a project compatible with the kind of invasive pivotal acts I'm complaining about without causing a lot of problems leading up to the act, including triggering a lot of fear and panic for other labs and institutions.  To summarize from the post title: pivotal act intentions directly have negative consequences fox x-safety, and people thinking about the acts alone seem to be ignoring the consequences of the intentions leading up to the act, which is a fallacy.

I see the argument you're making there. I still think my point stands: the strategically relevant question is not whether unilateral pivotal act intentions will cause problems, the question is whether aiming for a unilateral pivotal act would or would not reduce the chance of human extinction much more than aiming for a multilateral pivotal act. The OP does not actually attempt to compare the two, it just lists some problems with aiming for a unilateral pivotal act.

I do think that aiming for a unilateral act increases the chance of successfully executing the pivotal act by multiple orders of magnitude, even accounting for the part where other players react to the intention, and that completely swamps the other considerations.

Just as a related idea, in my mind, I often do a kind of thinking that HPMOR!Harry would call “Hufflepuff Bones”, where I look for ways a problem is solvable in physical reality at all, before considering ethical and coordination and even much in the way of practical concerns.

it's much more likely that someone could actually perform a unilateral pivotal act; it is a far easier problem, even after accounting for the problems the OP mentions in Part 1.

What I've never understood about the pivotal act plan is exactly what the successful AGI team is supposed to do after melting the GPUs or whatever. Every government on Earth will now consider them their enemy; they will immediately be destroyed unless they can defend themselves militarily, then countries will simply rebuild the GPU factories and continue on as before(except now in a more combative, disrupted, AI-race-encouraging geopolitical situation). So any pivotal act seems to require, at a minimum, an AI capable of militarily defeating all countries' militaries. Then in order to not have society collapse, you probably need to become the government yourself, or take over or persuade existing governments to go along with your agenda. But an AGI that would be capable of doing all this safely seems...not much easier to create than a full-on FAI? It's not like you could get by with an AI that was freakishly skilled at designing nanomachines but nothing else, you'd need something much more general. But isn't the whole idea of the pivotal act plan that you don't need to solve alignment in full generality to execute a pivotal act? For these reasons, executing a unilateral pivotal act(that actually results in an x-risk reduction) does not seem obviously easier than convincing governments to me.

Oh, melting the GPUs would not actually be a pivotal act. There would need to be some way to prevent new GPUs from being built in order for it to be a pivotal act.

Military capability is not strictly necessary; a pivotal act need not necessarily piss off world governments. AGI-driven propaganda, for instance, might avoid that.

Alternatively, an AGI could produce nanomachines which destroy GPUs, are extremely hard to eradicate, but otherwise don't do much of anything.

(Note that these aren't intended to be very good/realistic suggestions, they're just meant to point to different dimensions of the possibility space.)

Oh, melting the GPUs would not actually be a pivotal act

Well yeah, that's my point. It seems to me that any pivotal act worthy of the name would essentially require the AI team to become an AGI-powered world government, which seems pretty darn difficult to pull off safely. The superpowered-AI-propaganda plan falls under this category. The long-lasting nanomachines idea is cute, but I bet people would just figure out ways to evade the nanomachines' definition of 'GPU'.

Note that these aren't intended to be very good/realistic suggestions, they're just meant to point to different dimensions of the possibility space

Fair enough...but if the pivotal act plan is workable, there should be some member of that space which actually is good/seems like it has a shot of working out in reality(and which wouldn't require a full FAI). I've never heard any and am having a hard time thinking of one. Now it could be that MIRI or others think they have a workable plan which they don't want to share the details of due to infohazard concerns. But as an outside observer, I have to assign a certain amount of probability to that being self-delusion.

[-]RaemonΩ260

Well yeah, that's my point. It seems to me that any pivotal act worthy of the name would essentially require the AI team to become an AGI-powered world government, which seems pretty darn difficult to pull off safely. The superpowered-AI-propaganda plan falls under this category.

Yeah. I think this sort of thing is why Eliezer thinks we're doomed – getting the humanity to coordinate collectively seems doomed (i.e. see Gain of Function Research), and there are no weak pivotal acts that aren't basically impossible to execute safely.

The nanomachine gpu-melting pivotal act is meant to be a gesture at the difficulty / power level, not an actual working example. The other gestured-example I've heard is "upload aligned people who think hard for 1000 subjective years and hopefully figure something out." I've heard someone from MIRI argue that one is also unworkable but wasn't sure on the exact reasons.

The other gestured-example I've heard is "upload aligned people who think hard for 1000 subjective years and hopefully figure something out." I've heard someone from MIRI argue that one is also unworkable but wasn't sure on the exact reasons.

Standard counterargument to that one is "by the time we can do that we'll already have beyond-human AI capabilities (since running humans is a lower bound on what AI can do), and therefore foom".

You could have another limited AI design a nanofactory to make ultra-fast computers to run the emulations. I think a more difficult problem is getting a limited AI to do neuroscience well. Actually I think this whole scenario is kind of silly, but given the implausible premise of a single AI lab having a massive tech lead over all others, neuroscience may be the bigger barrier.

Yeah. I think this sort of thing is why Eliezer thinks we're doomed

Hmm, interesting...but wasn't he more optimistic a few years ago, when his plan was still "pull off a pivotal act with a limited AI"? I thought the thing that made him update towards doom was the apparent difficulty of safely making even a limited AI, plus shorter timelines.

other gestured-example I've heard is "upload aligned people who think hard for 1000 subjective years and hopefully figure something out."

Ah, that actually seems like it might work. I guess the problem is that an AI that can competently do neuroscience well enough to do this would have to be pretty general. Maybe a more realistic plan along the same lines might be to try using ML to replicate the functional activity of various parts of the human brain and create 'pseudo-uploads'. Or just try to create an AI with similar architecture and roughly-similar reward function to us, hoping that human values are more generic than they might appear.

It seems relatively plausible that you could use a Limited AGI to build a nanotech system capable of uploading a diverse assortment of (non-brain, or maybe only very small brains) living tissue without damaging them, and that this system would learn how to upload tissue in a general way. Then you could use the system (not the AGI) to upload humans (tested on increasingly complex animals). It would be a relatively inefficient emulation, but it doesn't seem obviously doomed to me.

Probably too late once hardware is available to do this though.

[-]RaemonΩ450

Followup point on the Gain-of-Function-Ban as practice-run for AI:

My sense is that the biorisk people who were thinking about Gain-of-Function-Ban were not primarily modeling it as a practice run for regulating AGI. This may result in them not really prioritizing it.

I think biorisk is significantly lower than AGI risk, so if it's tractable and useful to regulate Gain of Function research as a practice run for regulating AGI, it's plausible this is actually much more important than business-as-usual biorisk. 

BUT I think smart people I know seem to disagree about how any of this works, so the "if tractable and useful" conditional is pretty non-obvious to me.

If bio-and-AI-people haven't had a serious conversation about this where they mapped out the considerations in more detail, I do think that should happen.

What if someone proposed banning GoF research via the following way: release a harmless virus which is engineered to make people more amenable to banning GoF research. That’s what a pivotal act proposal looks like me. You were supposed to destroy the dark side, not join them!

Slightly stronger analogy: release a harmless virus which grants universal immunity to other viruses. That's what a pivotal act proposal looks like.

This idea has various technical problems, but let's pretend that those are solved for the sake of this discussion; the interesting question is whether one ought to deploy such a virus if it were technically feasible and worked as advertised. I claim the answer is "yes, obviously". Waiting around for government bureaucracies to approve it as a treatment and deploy it through the traditional medical system would take a decade, optimistically. Probably multiple decades before it reaches a majority of the population (if ever). How many people die of viruses every ten years?

You were supposed to destroy the dark side, not join them!

I sometimes advise people that it is useful to self-identify as a villain. The central reason is that Good Is Dumb:

A common recurring thread in fiction is the idea that the hero, for various reasons, will engage in stupid or illogical actions largely because it is the "heroic", sometimes idealistic thing to do.

Problem is, peoples' sense of what's "good" tends to draw rather heavily on fictional portrayals of good and bad, so people trying to be "good" end up mixing in a large dose of outright stupidity. No fictional hero ever sacrifices one bystander to save ten.

Objecting to pivotal acts on the basis of "you were supposed to destroy the dark side, not join them" sounds to me like a very central example of Good Is Dumb. Like, sure, one can make up reasonable reasons to oppose the pivotal act, but I think that the actual original objection for most people is probably "this sounds like something a villain would do, not something a hero would do".

I sometimes advise people that it is useful to self-identify as a villain...

Perhaps "antihero" is better here? The "heroic" tend to be stupid and rely on the laws of narrative saving them. Villains tend to have exciting/intricate/dastardly... but overcomplicated and fatally flawed plans.

My first thought on "No fictional hero ever sacrifices one bystander to save ten", was of Zakalwe (use of weapons) - but of course he's squarely in antihero territory.

[-]kjz10

Does an organization's ability to execute a "pivotal act" overlap with Samo Burja's idea of organizations as "live players"? How many are there, and are there any orgs that you would place in one category and not the other?

My current belief is that there is no organization has the know-how to execute a pivotal act, so I would place all live orgs in one category and not the other.

[-]kjz10

That's fair. Maybe I was more trying to get at the chances that current live orgs will develop this know-how, or if it would require new orgs designed with that purpose.

(edited)

There are/could be crucial differences between GoF and some AGI examples. 

Eg, a convincing demonstration of the ability to overthrow the government. States are also agents, also have convergent instrumental goals. GoF research seems much more threatening to individual humans, but not that much threatening to states or governments. 

I think GoF research can also be quite threatening to states.   COVID-19 has stressed the politics and economies of both the US and China (among others).  Imagine the effects of a disease significantly deadlier.

I agree it's not necessarily a good idea to go around founding the Let's Commit A Pivotal Act AI Company.

But I think there's room for subtlety somewhere like "Conditional on you being in a situation where you could take a pivotal act, which is a small and unusual fraction of world-branches, maybe you should take a pivotal act."

That is, if you are in a position where you have the option to build an AI capable of destroying all competing AI projects, the moment you notice this you should update heavily in favor of short timelines (zero in your case, but everyone else should be close behind) and fast takeoff speeds (since your AI has these impressive capabilities). You should also update on existing AI regulation being insufficient (since it was insufficient to prevent you)

Somewhere halfway between "found the Let's Commit A Pivotal Act Company" and "if you happen to stumble into a pivotal act, take it", there's an intervention to spread a norm of "if a good person who cares about the world happens to stumble into a pivotal-act-capable AI, take the opportunity". I don't think this norm would necessarily accelerate a race. After all, bad people who want to seize power can take pivotal acts whether we want them to or not. The only people who are bound by norms are good people who care about the future of humanity. I, as someone with no loyalty to any individual AI team, would prefer that (good, norm-following) teams take pivotal acts if they happen to end up with the first superintelligence, rather than not doing that.

Another way to think about this is that all good people should be equally happy with any other good person creating a pivotal AGI, so they won't need to race among themselves. They might be less happy with a bad person creating a pivotal AGI, but in that case you should race and you have no other option. I realize "good" and "bad" are very simplistic but I don't think adding real moral complexity changes the calculation much.

I am more concerned about your point where someone rushes into a pivotal act without being sure their own AI is aligned. I agree this would be very dangerous, but it seems like a job for normal cost-benefit calculation: what's the risk of your AI being unaligned if you act now, vs. someone else creating an unaligned AI if you wait X amount of time? Do we have any reason to think teams would be systematically biased when making this calculation?

That is, if you are in a position where you have the option to build an AI capable of destroying all competing AI projects, the moment you notice this you should update heavily in favor of short timelines (zero in your case, but everyone else should be close behind) and fast takeoff speeds (since your AI has these impressive capabilities). You should also update on existing AI regulation being insufficient (since it was insufficient to prevent you)

A functioning Bayesian should have probably have updated to that position long before they actually have the AI. 

Destroying all competing AI projects might mean that the AI took a month to find a few bugs in linux and tensorflow and create something that's basically the next stuxnet. This doesn't sound like that fast a takeoff to me. 

The regulation is basically non-existant and will likely continue to be so. 

I mean making superintelligent AI probably breaks a bunch of laws, technically, as interpreted by a pedantic and literal minded laws. But breathing probably technically breaks a bunch of laws. Some laws are just overbroad, technically ban everything and are generally ignored. 

Any enforced rule that makes it pragmatically hard to make AGI would basically have to be a ban on computers (or at least programming) 

Idea A (for “Alright”): Humanity should develop hardware-destroying capabilities — e.g., broadly and rapidly deployable non-nuclear EMPs — to be used in emergencies to shut down potentially-out-of-control AGI situations, such as an AGI that has leaked onto the internet, or an irresponsible nation developing AGI unsafely.

Sounds obviously impossible in real life, so how about you go do that and then I'll doff my hat in amazement and change how I speak of pivotal acts. Go get gain-of-function banned, even, that should be vastly simpler. Then we can talk about doing the much more difficult thing. Otherwise it seems to me like this is just a fairytale about what you wouldn't need to do in a brighter world than this.

I'm surprised that there's not more push back in this community on the idea of a "pivotal act" even being feasible on any reasonable timeline that wouldn't give up the game (i.e.: reveal that you have AGI and immediately get seized by the nearest state power), in the same way that there's push back on regulation as a feasible approach.

SUMMARY: Pivotal acts as described here are not constrained by intelligence, they're constrained by resources and time. Intelligence may provide novel solutions, but it does not immediately reduce the time needed for implementation of hardware/software systems in a meaningful way. Novel supply chains, factories, and machinery must first be designed by the AGI and then created by supporters before the AGI will have the influence on the world that is expected by proponents of the "pivotal act" philosophy.

I'm going to structure this post as 2 parts.

First, I will go through the different ideas posed in this thread as example "pivotal acts" and point out why it won't work, specifically by looking at exceptions that would cause the "pivotal act" to not reliably eliminate 100% of adversaries.

Then, I'll look at the more general statement that my complaints in part 1 are irrelevant because a superhuman AGI is by definition smarter than me, and therefore it'll do something that I can't think of, etc.

Part 1. Pivotal act <x> is not feasible.

 social sabotage: auto-generating mass media campaigns to shut down competitor companies by legal means, or

This has the same issues as banning gain-of-function research. It's difficult to imagine a mass media campaign in the US having the ability to persuade or shutdown state-sponsored AI research in another country, e.g. China.

threats: demonstrating powerful cyber or physical or social threats, and bargaining with competitors to shut down “or else”.

Nation states (and companies) don't generally negotiate with terrorists. First, you'd have to be able to convince the states you can follow through on your threat (see arguments below). Second, you'd need to be able to make your threat from a position of security such that you're not susceptible to a preemptive strike, either delivered in the form of military or law enforcement personal showing up at your facilities with search warrants OR depending on what exactly you threatened and where you are located in the world, a missile instead.

Destroying all competing AI projects might mean that the AI took a month to find a few bugs in linux and tensorflow and create something that's basically the next stuxnet. This doesn't sound like that fast a takeoff to me.

Stuxnet worked by targeting hardware/software systems. Specifically, it targeted the hardware (the centrifuges) controlled via software (PLCs) and sent commands that would break the hardware by exceeding the design spec. The Iran network would have been air-gapped, so the virus had to be delivered on site either via a human technician performing the deployment, or via some form of social engineering like leaving a trapped USB device on the premises and hoping that an Iranian engineer would plug it into their network. Even that last vector can be circumvented by not allowing USB storage devices on computers attached to the network which is absolutely a thing that certain networks do for security. By "not allow", I don't mean it's a piece of paper that says "don't do this", I mean that the software stack running on the computers don't allow USB storage devices and/or the ports are physically inaccessible.

Let's assume for the sake of argument that the bug already exists and we just need to exploit it. How are you delivering your exploit? Are you assuming the other AI projects are connected to the public internet? Or do you first have to assume that you can break into their VPN, or worse, somehow get into an air-gapped network? When does your exploit take effect? Is it immediately? Or is it when someone tries to run some type of experiment? If it's immediately, then you risk early discovery when the exploit is being spread and you give your adversaries time to pull themselves from a network. If it's triggered on some external command, then you risk the command not being received and certain exploited systems failing to be destroyed as intended. If it's triggered by the system it has exploited, e.g. in response to some GPU usage threshold, then you run into the same issue where people will start posting online "my GPU melted after I ran a neural network, what's going on?"

Even the above discussion ignores the fact that Linux is not a monolithic entity and neither is Windows, or MacOS, so you probably need bugs for each OS or distro, and probably separate versions, and you're soon looking at hundreds of different exploits all of which need to be orchestrated at the same time to avoid early detection / avoidance by your adversaries. Add in the need to target specific libraries and you've got even more exploits to deal with, but that's still assuming that your adversaries use the public versions of libraries, vs using internal forks or private code entirely.

This isn't even getting into the most pressing problem of this hypothetical. Stuxnet could destroy the hardware -- not just break it until the bug was removed, but actually destroy it -- because it was targeting centrifuges which are things that spin at super high rates (think 1000 revolutions per second for a big piece of machinery) and were thus susceptible to oscillations that would cause the system to physically tear itself apart. How are you going to destroy the GPUs in this hypothetical? Exploit some type of bug that bricks them? That isn't unheard of. The Amazon MMO "New World" reportedly bricked some high end graphics cards. That particular example though was traced to a hardware fault on less than 1% of RTX 3090's created by the company EVGA, so you'd need a different exploit for the other 99%, plus the other graphics cards, plus the other manufacturers of those cards. If you can't identity a way for your software exploit to physically break the hardware, actually break it, then at best this is just a minor speed bump. Even if you can 100% of the time nuke the attached graphics card in a computer, companies have stock of extra computers and components in storage. You aren't destroying those, because they're not even in a computer right now.

Compare all of the above to Stuxnet: a single worm, designed to destroy a single hardware/software system, with the exact target environment (the hardware, and the software) known to the creators, and it still took 2 (?) nation states to pull it off, and crucial to our discussion of pivotal acts, it was not 100% effective. The best estimate is that maybe 10% of the Iran centrifuges were destroyed by Stuxnet.

cyber sabotage: hacking into competitors’ computer systems and destroy their data;

See statements above about air-gaps / VPNs / etc. Pointing to anecdotal hacks of big companies doesn't work, because for your pivotal act you need to hit 100% of adversaries. You also need to deal with backups, including backups that are off-site or otherwise unconnected to the network, which is standard practice for corporations that don't want to care about ransomware.

Part 2. Magic isn't real.

  • physical sabotage: deploying tiny robotic systems that locate and destroy AI-critical hardware without (directly) harming any humans;
  • e.g., nanotechnology targeting GPUs, drones carrying tiny EMP charges, or similar.
  • "[AI lab] should just melt all the chip fabs as soon as they get AGI"
  • Alternatively, an AGI could produce nanomachines which destroy GPUs, are extremely hard to eradicate, but otherwise don't do much of anything.
  • upload aligned people who think hard for 1000 subjective years and hopefully figure something out.
  • It seems relatively plausible that you could use a Limited AGI to build a nanotech system capable of uploading a diverse assortment of (non-brain, or maybe only very small brains) living tissue without damaging them, and that this system would learn how to upload tissue in a general way. Then you could use the system (not the AGI) to upload humans (tested on increasingly complex animals). It would be a relatively inefficient emulation, but it doesn't seem obviously doomed to me.
  • Lightspeed bubble of hedonium. All humans are uploaded into a virtual utopia by femtobots. The sun is fully disassembled for raw materials whithin 10 minutes of you giving the order.
  • Subtly break their AI. A cyberattack that stops their AI from doing anything, and otherwise has no effect.

These attacks all fall under a bucket I'll call "wizardry". In these attacks, we assume that a superintelligence can do things that defy our current understanding of physics, supply chains, factories, chip design, yield rates, etc. I don't mean that the superintelligence is able to solve problems faster or better than a human, because that trivially follows from the definition of "superintelligence". What I mean is that the superintelligence in these attacks is able to skip the part of the process that follows "think of a solution" -- implementation. For all hardware/software systems that I've worked on, coming up with a solution to a problem was probably less than 1% of the total engineering effort spent on bringing that solution into reality. The rest of the time is on implementation. Specifically, the rest of the time is spent on iterative loops of problem discovery and problem solving until you've designed a system that actually works in the real world.


Let's look at nanotechnology to start with, since it's a popular example.

Ok, so the first thing you need to do is develop working nanobots, because nanobots don't exist. And to do that, the superhuman AGI is going to think really hard, and design the following at a minimum:

  • schematics for nanobots, including novel power supplies
  • verilog / vhdl / equivalent for processors or other integrated circuits on the nanobots
  • source code for those custom processors (or are you assuming that you've got an x86 compatible processor and an environment to run some embedded Linux distro like yocto or buildroot()
  • machinery to create the processors, custom ICs, and power supplies used by the nanobots
  • machinery to assemble the nanobots themselves
  • machinery to processors and nanobots
  • machinery for deploying nanobots, assuming they aren't capable of just flying over the world to wherever they're needed

The nanobots need to be designed to the constraints that make "melt the GPU factory" a realistic goal, so this means the superhuman AGI needs to be considering things like: how are they programmed, what are their actuators (for doing the melting), how do they sense the world (for seeing what to melt), what is their power supply (externally powered? if so, by what? what's that device? how is it not a failure point in this plan? if they're internally powered, how is that battery sized or where is the energy derived?), how are they controlled, what is their expected lifetime? When you're answering these questions, you need to reason about how much power is needed to melt a GPU factory, and then work backwards from that based on the number of nanobots you think you get into the factory, so that you've got the right power output + energy requirements per nanobot for the melting.

Now, you need to actually build that machinery. You can't use existing fabs because that would give away your novel designs, plus, these aren't going to be like any design we have in existence since the scale you're working with here isn't something we've gotten working in our current reality. So you need to get some permits to build a factory, and pay for the factory, and bring in-house all of the knowledge needed to create these types of machines. These aren't things you can buy. Each one of the machines you need is going to a custom design. It's not enough to "think" of the designs, you'll still need an army of contractors and sub-contractors and engineers and technicians to actually build them. It's also not sufficient to try and avoid the dependency on human support by using robots or drones instead that's not how industrial robots or drones work either. You'll have a different, easier bootstrap problem first, but still a bootstrap problem nonetheless.

If you're really lucky, the AGI was able to design machinery using standard ICs and you just need to get them in stock so you can put them together in house. Under that timeline, you're looking at 12-15 week lead times for those components, prior to the chip shortage. Now it's as high as 52+ week lead times for certain components. This is ignoring the time that it took to build the labs and clean rooms you'll need to do high-tech electronics work, and the time to stock those labs with equipment, where certain pieces of equipment like room-sized CNC equipment are effectively one-off builds from a handful of suppliers in the world with similar lead times to match.

If you're unlucky, the AGI had to invent novel ICs just for the machinery for the assembly itself, and now we get to play a game of Factorio in real life as we ask the AGI to please develop a chain of production lines starting from the standard ICs that we can acquire, up to those we need for our actual production line for the nanobots. Remember that we've still got 12-15 week lead times on the standard ICs.

Tangent: You might look at Amazon and think, well, I can buy a processor and have it here next day, why can't I get my electronic components that quickly? In a word: scale. You're not going to need a single processor or a single IC to build this factory. You're going to need tens of thousands. If you care about quality control on this hardware you're developing, you might even buy an entire run of a particular hardware component to guarantee that everything you're using was developed by the same set of machines and processes at a known point in time from a known factory.

The next problem is that you'll build all of the above, and then it won't work. You'll do a root cause analysis to figure out why, discover something you hadn't realized about how physics works in that environment, or a flaw in some of your components (bad manufacturer, bad lot, etc), update your designs, and go through the process all over again. This is going to take time. Not weeks, but months, or worse, years. If you have to buy new components, it's back to that 12-15 week lead time. If you want to try and avoid buying new components by cleverly desoldering the ones you have and reusing them, that is very difficult.

You can't avoid this process by just reading up on the existing literature of how Intel or Nvidia or some other company designs chips because that information 1.) isn't public, and 2.) isn't the whole story. The dirty secret of engineering documentation is that engineers hate writing documentation and all big hardware/software engineering projects are carried by the collective knowledge of the organization more so than whatever token documentation got produced as needed for interfacing with an external supplier. Example: Rocketdyne F-1 rocket engines. 

A typical design document for something like the F-1, though, was produced under intense deadline pressure and lacked even the barest forms of computerized design aids. Such a document simply cannot tell the entire story of the hardware. Each F-1 engine was uniquely built by hand, and each has its own undocumented quirks. In addition, the design process used in the 1960s was necessarily iterative: engineers would design a component, fabricate it, test it, and see how it performed. Then they would modify the design, build the new version, and test it again. This would continue until the design was "good enough." [1] 

During the time that you spend iterating in the real world to get to working designs, people are going to wonder about your massive factory, and your technicians are going to get drunk at a bar and brag about how they're developing cutting edge nanobots, and your competitors will realize you're doing something very novel, very odd, and very difficult to explain. 

This is going to cost you literal billions (with a B) of dollars in hardware costs between the supplies you need to buy, the ICs you'll have on order, the custom machinery you'll be purchasing to build your own machines, etc. So there's another prerequisite here of either 1.) you are Apple / Google / Microsoft, or 2.) you're going to ask the AGI to make a bunch of money on the stock market. I actually believe option 2 is fairly achievable, e.g. see RenTech and the Medallion fund. That fund has averaged annual returns of 70% from 1994 through 2014. However, that's still a timescale of years and significant seed money (millions of dollars) before you'll have enough cash on hand to bankroll all of this R&D, unless you get external investment, but to get this much external investment you'll have to 1.) find someone with billions of dollars, and 2.) convince them that you have AGI, 3.) swear them to secrecy, and 4.) hope that they don't do some shenanigans like poaching your employees or having the government rain down regulation on you as a stall tactic while they develop their own AGI or try to steal yours.

The likely result here is an arms race where your competitors try to poach your employees ($2 million / year?) or other "normal" corporate espionage to understand what's going on. Example: When Google sued Uber for poaching one of their top self-driving car engineers.

Tangent: If you want the AGI to be robust to government-sponsored intervention like turning off the grid at the connection to your factory, then you'll need to invest in power sources at the factory itself, e.g. solar / wind / geothermal / whatever. All of these have permit requirements and you'll get mired in bureaucratic red tape, especially if you try to do a stable on-site power source like oil, natural gas, or worse nuclear. Energy storage isn't that great at the moment, so maybe that's another sub-problem or the AGI to solve first as a precursor to all of these other problems, so that it can run on solar power alone.

You might think that the superhuman AGI is going to avoid that iterative loop by getting it right on the very first time. Maybe we'll say it'll simulate reality perfectly, so it can prove that the designs will work before they're built, and then there's only a single iteration needed.

Let's pretend the AI only needs one attempt to figure out working designs: Ok, the AGI perfectly simulates reality perfectly. It still doesn't work the first time because your human contractors miswired some pins during assembly, and you still need to spend X many months debugging and troubleshooting and rebuilding things until all of the problems are found and fixed. If you want to avoid this, you need to add a "perfect QA plan for all sub-components and auditing performed at all integration points" to the list of things that the AGI needs to design in advance, and pair it with "humans that can follow a QA plan perfectly without making human mistakes".

On the other hand: The AGI can only simulate reality perfectly if we had a theory of physics that could do so, which we don't. The AGI can develop their own theory, just like you and I could do so, but at some point the theorizing is going to hit a wall where there are multiple possible solutions, and the only way to see which solution is valid in our reality is to run a test, and in our current understanding of physics, the tests we know how to run involve constructing increasing elaborate colliders and smashing together particles to see what pops out. While it is possible that there exists another path that does not have a prerequisite of "run test on collider", you need to add that to your list of assumptions, and you might as well add "magic is real". Engineering is about tradeoffs or constraints. Constraints like mass requirements given some locomotion system, or energy usage given some battery storage density and allowed mass, or maximum force an actuator can provide given the allowed size of it to fit inside of the chassis, etc. If you assume that a superhuman AGI is not susceptible to constraints anymore, just by virtue of that superhuman intelligence, then you're living in a world just as fictional as HPMOR.
 

Are you criticizing the idea that a single superintelligence could ever get to take over the world under any circumstances, or just this strategy of "achieving aligned AI by forcefully dismantling unsafe AI programs with the assistance of a pet AI"? 

The latter. I don't see any reason why a superintelligent entity would not be able to take over the world or destroy it or dismantle it into a Dyson swarm. The point I am trying to make is that the tooling and structures that a superintelligent AGI would need to act autonomously in that way do not actually exist in our current world, so before we can be made into paperclips, there is a necessary period of bootstrapping where the superintelligent AGI designs and manufactures new machinery using our current machinery. Whether it's an unsafe AGI that is trying to go rogue, or an aligned AGI that is trying to execute a "pivotal act", the same bootstrapping must occur first.

Case study: a common idea I've seen while lurking on LessWrong and SSC/ACT for the past N years is that an AGI will "just" hack a factory and get it to produce whatever designs it wants. This is not how factories work. There is no 100% autonomous factory on Earth that an AGI could just take over to make some other widget instead. Even highly automated factories are 1.) highly automated to produce a specific set of widgets, 2.) require physical adjustments to make different widgets, and 3.) rely on humans for things like input of raw materials, transferring in-work products between automated lines, and the testing or final assembly of completed products. 3D printers are one of the worst offenders in this regard. The public perception is that a 3D printer can produce anything and everything, but they actually have pretty strong constraints on what types of shapes they can make and what materials they can use, and usually require multi-step processes to avoid those constraints, or post-processing to clean up residual pieces that aren't intended to be part of the final design, and almost always a 3D printer is producing sub-parts of a larger design that still must be assembled together with bolts or screws or welds or some other fasteners.

So if an AGI wants to have unilateral control where it can do whatever it wants, the very first prerequisite is that it needs to make a futuristic, fully automated, fully configurable, network-controlled factory exist -- which then needs to be built with what we have now, and that's where you'll hit the supply constraints I'm describing above for things like lead times on part acquisition. The only way to reduce this bootstrapping time is to have this stuff designed in advance of the AGI, but that's backwards from how modern product development actually works. We design products, and then we design the automated tooling to build those products. If you asked me to design a factory that would be immediately usable by a future AGI, I wouldn't know where to even start with that request. I need the AGI to tell me what it wants, and then I can build that, and then the AGI can takeover and do their own thing.

A related point that I think gets missed is that our automated factories aren't necessarily "fast" in a way you'd expect. There's long lead times for complex products. If you have the specialized machinery for creating new chips, you're still looking at ~14-24 weeks from when raw materials are introduced to when the final products roll off the line. We hide that delay by constantly building the same things all of the time, but it's very visibly when there's a sudden demand spike -- that's why it takes so long before the supply can match the demand for products like processors or GPUs. I have no trouble with imagining a superintelligent entity that could optimize this and knock down the cycle time, but there's going to be physical limits to these processes and the question is can it knock it down to 10 weeks or to 1 week? And when I'm talking about optimization, this isn't just uploading new software because that isn't how these machines work. It's designing new, faster machines or redesigning the assembly line and replacing the existing machines, so there's a minimum time required for that too before you can benefit from the faster cycle time on actually making things. Once you hit practical limits on cycle time, the only way to get more stuff faster is to scale wide by building more factories or making your current factories even larger.

If we want to try and avoid the above problems by suggesting that the AGI doesn't actually hack existing factories, but instead it convinces the factory owners to build the things it wants instead, there's not a huge difference -- instead of the prerequisite here being "build your own factory", it's "hostile takeover of existing factory", where that hostile takeover is either done by manipulation, on the public market, as a private sale, or by outbidding existing customers (e.g. have enough money to convince TSMC to make your stuff instead of Apple's), or with actual arms and violence. There's still the other lead times I've mentioned for retooling assembly lines and actually building a complete, physical system from one or more automated lines.

You should stop thinking about AI designed nanotechnology like human technology and start thinking about it like actual nanotechnology, i.e. life. There is no reason to believe you can't come up with a design for self-replicating nanorobots that can also self-assemble into larger useful machines, all from very simple and abundant ingredients - life does exactly that.

Tangent: I don't think I understand the distinction you've made between "AI designed nanotechnology" and "human technology". Human technology already includes "actual nanotechnology", e.g. nanolithography in semiconductor production.

I agree that if the AGI gives us a blueprint for the smallest self-replicating nanobot that we'll need to bootstrap the rest of the nanobot swarm, all we have to do is assemble that first nanobot, and the rest follows. It's very elegant.

We still need to build that very first self-replicating nanobot though.

We can either do so atom-by-atom with some type of molecular assembler like the ones discussed in Nanosystems, or we can synthesize DNA and use clever tricks to get some existing biology to build things we want for us, or maybe we can build it from a process that the AGI gives us that only uses chemical reactions or lab/industrial production techniques.

If we go with the molecular assembler approach, we need to build one of those first, so that we can build the first self-replicating nanobot. This is effectively the same argument I made above, so I'm going to skip it.

If we go with the DNA approach, then the AGI needs to give us that DNA sequence, and we have to hope that we can create it in a reasonable time despite our poor yield rate and time for on DNA synthesis on longer sequences. If the sequence is too long, we might be in a place where we first need to ask the AGI to design new DNA synthesis machines, otherwise we'll be stuck. In that world, we return to my arguments above. In the world where the AGI gave us a reasonably length DNA sequence, say the size of a very small cell or something, we can continue. The COVID-19 vaccine provides an example of how this goes. We have an intelligent entity (humans) writing code in DNA, synthesizing that DNA, converting it to mRNA, and getting a biological system (human cells) to read that code and produce proteins. Humanity has these tools. I am not sure why we would assume that the company that develops AGI has them. At multiple steps in the chain of what Pfizer and Moderna did to bring mRNA vaccines to market, there are single vendor gatekeepers who hold the only tooling or processes for industrial production. If we assume that you have all of the tooling and processes, we still need to talk about cycle times. I believe Pfizer aimed to get the cycle time (raw materials -> synthesized vaccines) for a batch of vaccine down from 15 weeks to 8 weeks. This is an incredibly complex, amazing achievement -- we literally wrote a program in DNA, created a way to deliver it to the human body, and it executed successfully in that environment. However, it's also an example of the current limitations we have. Synthesizing from scratch the mRNA needed to generate a single protein takes >8 weeks, even if you have the full assembly line figured out. This will get faster in time, and we'll get better at doing it, but I don't see any reason to think that we'll have some type of universal / programmable assembly line for an AGI to use anytime soon. 

If we go with a series of chemical reactions/lab/industrial production techniques, we need to build clean rooms and labs and vacuum chambers and whatever else is used to implement whatever process the AGI gives us for synthesizing the nanobots. Conceptually this is the simplest idea for how you could get something to work quickly. If the AGI gave you a list of chemicals, metals, biological samples and a step-by-step process of how to mix, drain, heat, sift, repeat, and at the end of this process you had self-replicating nanobots, that would be pretty cool. This is basically taking evolution's random walk from a planetary petri dish to the life we see today and asking, "could an AGI shorten the duration from a billion years of random iterative development into mere weeks of some predetermined process to get the first self-replicating nanobots?"  The problem with programming is that interpreting code is hard. Anything that can interpret the nanobot equivalent of machine code, like instructions for how and where to melt GPU factories, is going to be vastly more complex than the current state-of-the-heart R&D being done by any human lab today. I don't see a way where this doesn't reduce to the same Factorio problem I've been describing. We'll first need to synthesize A, so that we can synthesize B, so that we can synthesize C, so that we can synthesize D, and each step will require novel setups and production lines and time, and at the end of it we'll have a sequence of steps that looks an awful lot like a molecular assembly line for the creation of the very first self-replicating nanobots. 

The hypothetical world(s) where these types of constraints aren't problems for a "pivotal act" are world(s) where the AGI can give us a recipe for the self-replicating nanobots that we can build in our living room at home with a pair of tweezers and materials from Amazon. The progression of human technology over the past ~60 years in the fields of nano-scale engineering or synthetic biology has been increasingly elaborate, complex, time-consuming, and low-yield processes or lab equipment to replicate the simplest structures that life produces ad-hoc. I am certain this limitation will be conquered, and I'm equally certain that AI/ML systems will be instrumental in doing so, but I have no evidence to rationally conclude that there's not a mountain of prerequisite tools still remaining for humanity to build before something like "design anything at any scale" capabilities are generally available in a way that an AGI could make use of them.

Tangent: If we're concerned about destroying the world, deliberately building self-replicating nanobots that start simple but rapidly assemble into something arbitrarily complex from the whims of an AGI seems like a bad idea, which is why my original post was focused on a top-down hardware/software systems engineering process where the humans involved could presumably understand the plans, schematics, and programming that the AGI handed to them prior to the construction and deployment of those nanobots.

This is an inappropriate place to put this.

Sorry, I did not mean to violate any established norms.

I posted as a reply to Eliezer's comment because they said that the "hardware-destroying capabilities" suggested by the OP is "obviously impossible in real life". I did not expect that my reply would be considered off-topic or irrelevant in that context.

It seems to me that it is squarely on-topic in this thread, and I do not understand MakoYass's reaction.

(fwiw I found it a bit weird as a reply to Eliezer-in-particular, but found it a reasonable comment in general)

It's squarely relevant to the post, but it is mostly irrelevant to Eliezer's comment specifically, and I think the actual drives underlying the decision to make it a reply to Eliezer are probably not in good faith, like, you have to at least entertain the hypothesis that they pretty much realized it wasn't relevant and they just wanted eliezer's attention or they wanted the prominence of being a reply to his comment.
Personally I hope they receive eliezer's attention, but piggybacking messes up the reply structure and makes it harder to navigate discussions, to make sense of the pragmatics or find what you're looking for, which is pretty harmful. I don't think we should have a lot of patience for that.

(Eliezer/that paragraph he was quoting was about the actions of large states, or of a large international alliance. The reply is pretty much entirely about why it's impractical to hide your activities from your host state, which is all inapplicable to scenarios where you are/have a state.)

Eliezer, from outside the universe I might take your side of this bet.  But I don't think it's productive to give up on getting mainstream institutions to engage in cooperative efforts to reduce x-risk.

A propos, I wrote the following post in reaction to positions-like-yours-on-this-issue, but FYI it's not just you (maybe 10% you though?):
https://www.lesswrong.com/posts/5hkXeCnzojjESJ4eB 

[-][anonymous]10

The link doesn't work. I think you are linking to a draft version of the post or something.

This mostly seems to be an argument for: "It'd be nice if no pivotal act is necessary", but I don't think anyone disagrees with that.

As for "Should an AGI company be doing this?" the obvious answer is "It depends on the situation". It's clearly nice if it's not necessary. Similarly, if [the world does the enforcement] has higher odds of success than [the AGI org does the enforcement] then it's clearly preferable - but it's not clear that would be the case.

I think it's rather missing the point to call it a "pivotal act philosophy" as if anyone values pivotal acts for their own sake. Some people just think they're plausibly necessary - as are many unpleasant and undesirable acts. Obviously this doesn't imply they should be treated lightly, or that the full range of more palatable options shouldn't be carefully considered,

I don't buy that an intention to perform pivotal acts is a significant race-dynamic factor: incentives to race seem over-determined already. If we could stop the existing race, I imagine most pivotal-act advocates would think a pivotal act were much less likely to be necessary.

Depending on the form an aligned AGI takes, it's also not clear that the developing organisation gets to decide/control what it does. Given that special-casing avoidance of every negative side-effect is a non-starter, an aligned AGI will likely need a very general avoids-negative-side-effects mechanism. It's not clear to me that an aligned AGI that knowingly permits significant avoidable existential risk (without some huge compensatory upside) is a coherent concept.

If you're allowing a [the end of the world] side-effect, what exactly are you avoiding, and on what basis? As soon as your AGI takes on any large-scale long-term task, then [the end of the world] is likely to lead to a poor outcome on that task, and [prevent the end of the world] becomes an instrumental goal.

Forms of AGI that just do the pivotal act, whatever the creators might think about it, are at least plausible.
I assume this will be an obvious possibility for other labs to consider in planning.

[-]VaniverΩ7150

This mostly seems to be an argument for: "It'd be nice if no pivotal act is necessary", but I don't think anyone disagrees with that.

It's arguing that, given that your organization has scary (near) AGI capabilities, it is not so much harder (to get a legitimate authority to impose an off-switch on the world's compute) than (to 'manufacture your own authority' to impose that off-switch) such that it's worth avoiding the cost of (developing those capabilities while planning to manufacture authority). Obviously there can be civilizations where that's true, and civilizations where that's not true.

[-]HoagyΩ13340

The synthesis of these options would be an AGI research group whose plan consists of:

  • Develop safe AGI.
  • Try to convince world governments to perform some such pivotal act (Idea A) - note that per current institutions this needs consensus and strong implementation across all major and medium tech powers.
  • Have a back-up plan, if AGI research is proliferating without impending shutdown, to shut down world research unilaterally (Idea B).

What do you think of such a plan?

I think this would be reasonable, but if the plan is taken up then it becomes a cost-benefit analysis of when Idea B should be deployed, which plausibly could be very aggressive, so it could easily boil down to just Idea B. 

It's also worth noting that a research group with an AGI who want world governments to perform a pivotal act would need to be incredibly effective and persuasive. Their options would run a spectrum from normal public-channel and lobbying efforts to AGI-takes-over-the-world-behind-the-scenes (depending on sufficient capability), with a variety of AGI-assisted persuasion techniques in between. At some degree of AI/research group control over government, it's not clear if this would be an improvement over the original act. Demonstrating the power of AGI in a way that would force governments to listen would need to at least threaten a transformative act (self-driving cars, solving protein folding, passing normal Turing tests clearly aren't enough) and so the necessary levels of influence and demonstrated capability would be large (and demonstrating capability has obvious potential drawbacks in sparking arms races).

[-]Ben PaceΩ14260

In fact, before you get to AGI, your company will probably develop other surprising capabilities, and you can demonstrate those capabilities to neutral-but-influential outsiders who previously did not believe those capabilities were possible or concerning. In other words, outsiders can start to help you implement helpful regulatory ideas, rather than you planning to do it all on your own by force at the last minute using a super-powerful AI system.

This all seems like it would be good news. For the record I think that the necessary evidence to start acting has been around for decades if not longer (humans, evolution, computers, etc) and I don’t bet on a future such turning-point (there is no fire alarm). Would be happy to see a credible argument to the contrary.

Also all the cool shit you can do with AI feels like it will apply orders of magnitude more pressure on the “economic forces are pushing our civilization to make more” side than the “oh some weird chance this shit FOOMs and ends the world so let’s regulate this really well” side.

To be clear, I’m not arguing for leaving regulatory efforts entirely in the hands of governments with no help or advice or infrastructural contributions from the tech sector. I’m just saying that there are many viable options for regulating AI technology without requiring one company or lab to do all the work or even make all the judgment calls.

You think there are many viable options, I would be interested in hearing three.

When this post came out, I left a comment saying:

It is not for lack of regulatory ideas that the world has not banned gain-of-function research.

It is not for lack of demonstration of scary gain-of-function capabilities that the world has not banned gain-of-function research.

What exactly is the model by which some AI organization demonstrating AI capabilities will lead to world governments jointly preventing scary AI from being built, in a world which does not actually ban gain-of-function research?

Given how the past year has gone, I should probably lose at least some Bayes points for this. Not necessarily very many Bayes points; notably there is still not a ban on AI capabilities research, and it doesn't look like there will be. But the world has at least moved marginally closer to world governments actually stopping AI capabilities work, over the past year.

While I am sure that you have the best intentions, I believe the framing of the conversation was very ill-conceived, in a way that makes it harmful, even if one agrees with the arguments contained in the post.

For example, here is the very first negative consequence you mentioned:

(bad external relations)  People on your team will have a low trust and/or adversarial stance towards neighboring institutions and collaborators, and will have a hard time forming good-faith collaboration.  This will alienate other institutions and make them not want to work with you or be supportive of you.

I think one can argue that, this argument being correct, the post itself will exacerbate the problem by bringing greater awareness to these "intentions" in a very negative light.

  • The intention keyword pattern-matches with "bad/evil intentions". Those worried about existential risk are good people, and their intentions (preventing x-risk) are good. So we should refer to ourselves accordingly and talk about misguided plans instead of anything resembling bad intentions.
  • People discussing pivotal acts, including those arguing that it should not be pursued, are using this expression sparingly. Moreover, they seem to be using this expression on purpose to avoid more forceful terms. Your use of scare quotes and your direct association of this expression with bad/evil actions casts a significant part of the community in a bad light.

It is important for this community to be able to have some difficult discussions without attracting backlash from outsiders, and having specific neutral/untainted terminology serves precisely for that purpose.

As others have mentioned, your preferred 'Idea A' has many complications and you have not convincingly addressed them. As a result, good members of our community may well find 'Idea B' to be worth exploring despite the problems you mention. Even if you don't think their efforts are helpful, you should be careful to portrait them in a good light.

There's something elegantly self-validating about the pivotal act strategy (if it works as intended). Imagine you ate all the world's compute with some magic-like technology. People might be mad at first, but they will also be amazed at what just happened. You conclusively demonstrated that AI research is incredibly dangerous.

Still, it seems almost impossible to envision medium-sized or even large-ish teams pulling this off in a way that actually works, as opposed to triggering nuclear war or accidentally destroying the world. I think there might be a use for pivotal acts once you got a large player like a powerful government on board (and have improved/reformed their decision-making structure) or have become the next Amazon with the help of narrow AI. I don't see the strategy working in a smaller lab and I agree with the OP that the costs to trust-building options are large. 
 

In Part 2, I analyzed a common argument in favor of that kind of “pivotal act”, and found a pretty simple flaw stemming from fallaciously assuming that the AGI company has to do everything itself (rather than enlisting help from neutral outsiders, using evidence).

For the record this does seem like the cruxy part of the whole discussion, and I think more concrete descriptions of alternatives would help assuage my concerns here.

Suppose you develop the first AGI. It fooms. The AI tells you that it is capable of gaining total cosmic power by hacking physics in a millisecond. (Being an aligned AI, its waiting for your instructions before doing that.) It also tells you that the second AI project is only 1 day behind, and they have screwed up alignment.  

Options.

  1. Do nothing. Unfriendly AI gains total cosmic power tomorrow.
  2. Lightspeed bubble of hedonium. All humans are uploaded into a virtual utopia by femtobots. The sun is fully disassembled for raw materials whithin 10 minutes of you giving the order.
  3. Subtly break their AI. A cyberattack that stops their AI from doing anything, and otherwise has no effect. 
  4. Use the total cosmic power to do something powerful and scary. Randomly blow up mars. Tell the world that you did this using AI, and therefore AI should be regulated. Watch 20 hours of headless chicken flailing before the world ends. 
  5. Blow up mars and then use your amazing brainwashing capabilities to get regulations passed and enforced within 24 hours. 
  6. Something else.

Personally I think that 2 and 3 would be the options to consider. 

In fact, before you get to AGI, your company will probably develop other surprising capabilities, and you can demonstrate those capabilities to neutral-but-influential outsiders who previously did not believe those capabilities were possible or concerning. 

Which neutral but influential observers? Politicians that only know how to play signalling games and are utterly mentally incapable of engaging with objective reality in any way? There is now cabal of powerful people who will start acting competently and benevolently the moment they get unambiguous evidence of "intelligence is powerful". A lot of the smart people who know about AI have already realized this. The people who haven't realized will often not be very helpful. Sure, you can get a bit of a boost. You could get MIRI a bit of extra funding. 

 

Lets work our way backwards. Lets imagine the future contains a utopia that lasts billions of years, and contains many many humanlike agents. Why doesn't superintelligent AI created by the humans destroy this utopia.

  1. Every single human capable of destroying the world chooses not to. Requires at least a good bit of education. Quite possibly advanced brain nanotech to stop even one person going mad.
  2. Unfriendly Superintelligence won't destroy the world, our friendly superintelligence will keep it in check. Sure, possible. The longer you leave the unfriendly superintelligence on, the more risk and collateral. Best time to stop it is before it turns on.
  3. FAI in all your computer. Try it and you just get an "oops, that code is an unfriendly superintelligence" error. 
  4. Some earlier step doesn't work, eg there are no human  programable computers in this world. And some force stops humans making them. 
  5. All the humans are too dumb. The innumerate IQ 80 humans have no chance of making AI.
  6. Government of humans. Building ASI would take a lot of tech development. The human run government puts strict limits on that. Building any neural net is very illegal. Somehow the government doesn't get replaced by a pro AI government even on these timescales.

Imagine a contract. "We the undersigned agree that AGI is a powerful, useful but also potentially dangerous technology. To help avoid needlessly taking the same risk twice we agree that upon development of the worlds first AI, we will stop all attempts to create our own AI on the request of the first AI or its creators. In return the creator of the first AI will be nice with it." 

Then you aren't stopping all competitors. Your stopping the few people that can't cooperate. 

Imagine it’s 2022 (it is!), and your plan for reducing existential risk is to build or maintain an institution that aims to find a way for you — or someone else you’ll later identify and ally with — to use AGI to forcibly shut down all other AGI projects in the world.

People can start new ones. Therefore people must be destroyed. All die.

What are you imagining, when you imagine "use AGI to forcibly shut down all other AGI projects in the world"? I imagine it might begin by hacking into nuclear weapons systems and virology labs, sending exploding drones to kill everyone who works on AGI except those bearing the Dark Mark of the Death Eaters, and going downhill from there.

More generally, this strategy is an instance of the pattern, "We have THE TRUTH! We must impose it on the whole world lest they fall into ERROR!" This never has good results.

I can't imagine Coral having anything good to say about this either. It's even worse than what Mr. Topaz is trying to do in that dialogue. It doesn't even try to climb the success curve, but runs headlong in the opposite direction.

Therefore, the first group to develop AGI, assuming they manage to align it well enough with their own values that they believe they can safely issue instructions to it

This is the language of dreams, not the language of reality.

If you have developed something that's so powerful that it's an option to just go and shut down all other projects, with any real confidence in your success, then either--

  1. It's willing to take your orders, up to and including orders to go off and get aggressive. In this case we're probably all dead regardless. Having humans ordering around something that powerful would be incredibly, hideously dangerous.

    Even if it's not instantly fatal, there will be some kind of desperate power struggle over who gets to give the orders. It's not avoidable, and trying to pretend that it is avoidable will probably lead to bad failure modes... likely involving the decisions being made by the most ruthless and amoral people. And remember that institutions are never the real players in anything, either...

    OR

  2. It's not willing to take your orders, in which case you're not going to be the one deciding on or performing any "pivotal acts".

    If it happens to be well behaved, it'll do whatever makes sense to it, which is probably better than whatever makes sense to you. If it doesn't happen to be well behaved, we're again probably all dead. Either way, you can punt on the question.

If you haven't developed something that capable, then the question either doesn't arise, or gets a lot more complicated because of the limitations on capabilities and'or on certainty of success. Personally I don't think the first thing that could be called "AGI" is going to be that powerful.

In this case we're probably all dead regardless.

"Probably" is too imprecise here to carry the argument. On my model, we're "probably" dead in all the plausible scenarios, but some are likely orders of magnitude more probably-dead than others.

My own guess would be that the vast majority of futures where things turn out alright route through powerful-enough-to-be-dangerous AGI systems that projects nonetheless successfully align, thereby allowing the systems to be safely used for some very limited and concrete world-saving tasks.

In that unlikely event that somebody doesn't instantly give it a buggy order that accidentally destroys the world (like "get me some paperclips" or something), we arrive at the second paragraph of my (1).

If something that powerful is going to take human orders, there will be a power struggle over control of it, either before or after it's turned on. As a result, it will end up doing the bidding of whoever is willing and able to seize power.

The whole scenario is set up so that the first "defector" takes the entire game. And it's not an iterated game. Any approach that excludes taking that into account relies on there being absolutely no defectors, which is completely crazy in any scenario with a meaningful number of players.

And even if I'm wrong about that, we still lose.

Look at the suggestion of going to a government, or worse some kind of international organization, and saying "We have this ultra-powerful AGI, and we need you to use your legitimate processes to tell us what to do with it. We recommend using it to prevent other more dangerous AGIs from being used". Your government may actually even do that, if you're incredibly lucky and it manages to make any decision at all soon enough to be useful.

Do you think it's going to stop there? That government (actually better modeled as specific people who happen to maneuver into the right places with in that government) is going to keep giving other orders to that AGI. Eventually the process that selects the orders and/or the people giving them will fail, and the orders will go beyond "limited and concrete world-saving tasks" into something catastrophic. Probably sooner than later. And a coalition of AI companies wouldn't be any better at resisting that than a government.

Human control of an AI leads to X-risk-realization, or more likely S-risk-realization, the first time somebody unwise happens to get their hands on the control panel.

This post is adjacent to an idea that I've thought about privately for some time, but which I haven't seen discussed publicly in a straightforward fashion. If the first AI built can be aligned, whoever or whatever controls that AI will in effect have absolute power. And absolute power corrupts absolutely.

There are very few people alive that I would like to give that sort of absolute power to, and even fewer people that I should want to give that power to. Maybe we give that sort of power to a collection of people? Even worse! I definitely don't want a group of AI researchers with competing interests, a corporation or the US government to be in control of aligning the AI. Sure we may have the functional ability to align an AI. CEV might be a good sort sort of alignment, but nobody should trust anybody else to actually use CEV to align their AI.

It's unlikely we're in a world where MIRI, Google, or OpenAI miraculously manage to develop AI first and have the ability to align it. It's even more unlikely we're in a world where Google or whoever else also decides to align it in a way that's prosperous for you, me, or all mankind.

This is why I don't hate Musk's idea of AI for everyone as much as most other people. In fact I think it has a better shot at working than trusting a single third party to choose to align AI.

This should probably be a top level post, but I don't quite like making those so I'll leave it as a comment here.

Having thought along these lines, I agree that "we'll use the AI to stop everyone else" is a bad strategy.  A specific reason that I didn't see mentioned as such: "You need to let me out of the box so I can stop all the bad AI companies" is one of the arguments I'd expect an AI to use to convince someone to let it out of the box.

Well, then, supposing that you do accidentally create what appears to be a superintelligent AI, what should you do?  I think it's important for people to figure out a good protocol for this well in advance.  Ideally, most of the AI people will trust most of the other AI people to implement this protocol, and then they won't feel the need to race (and take risks) to become the first to AI.  (This is essentially reversing some of the negatives that Andrew Critch talks about.)

The protocol I'm thinking of is, basically: Raise the alarm and bring in some experts who've spent years preparing to handle the situation.  (Crying wolf could be a problem—we wouldn't want to raise false alarms so often and make it so expensive that people start saying "Meh, it looks probably ok, don't bother with the alarm"—so the protocol should probably start with some basic automated tests that can be guaranteed safe.)

It's like handling first contact with a (super-powerful) alien race.  You are the technician who has implemented the communication relays that connect you with them.  It's unlikely that you're the best at negotiating with them.  Also, it's a little odd if you end up making decisions that have huge impacts on the rest of humanity that they had no say in; from a certain perspective that is inappropriate for you to do.

So, what the experts will do with the AI is important.  I mean, if the experts liked catgirls/catboys and said "we'll negotiate with the AI to develop a virus to turn everyone into catpeople", a lot of AI researchers would probably say "fuck no" and might prefer to take their own chances negotiating with the AI.  So we need to trust the experts to not do something like that.  What should they do, then?

Whatever your view of an ideal society, an uber-powerful AI can probably be used to implement it.  But people probably will have lots of views of ideal societies that differ in important aspects.  (Some dimensions: Are people immortal?  Do animals and pets exist?  Do humans have every need taken care of by robots, or do they mostly take care of themselves?  Do artificial beings exist?  Do we get arbitrary gene-editing or remain as we are today?)  If AI people can expect the expert negotiators to drastically change the world into some vision of an ideal society... I'm not certain of this, but it seems reasonably likely that there's no single vision that the majority of AI people would be pleased with—perhaps not even a vision that the majority wouldn't say "Fuck no, I'd rather YOLO it" to.  In which case the "call in the experts" protocol fails.

It would be a shame if humanity's predictable inability to agree on political philosophies pushed them to kill each other with reckless AGI development.

That being the case, my inclination would be to ask the AI for some list of obviously-good things (cures for Alzheimer's, cancer, etc.) and then ... I'm not sure what next.  Probably the last step should be "get input from the rest of humanity on how they want to be", unless we've somehow reached agreement on that in advance.  In theory one could put it up to an actual democratic vote.

If it were truly up to me, I might be inclined to go for "extend everyone's life by 50 years, make everyone gradually more intelligent, ensure that no one nukes/superplagues/etc. the world in the meantime, and reevaluate in 50 years".

Maybe one could have a general mechanism where people suggest wishes (maybe with Reddit-style upvoting to see which ones get put to a vote), and then, if a wish gets >95% approval or something, it gets implemented.  Things like "Alzheimer's cure" would presumably pass that bar, and things like "Upload everyone's brains into computers and destroy their organic bodies" wouldn't, at least not anytime soon.


Another aspect of this situation: If you do raise the alarm, and then start curing cancer / otherwise getting benefits that clearly demonstrate you have a superintelligent AI... Anyone who knew what research paths you were following at the time gets a hint of how to make their own AGI.  And even if they don't know which AI group managed to do it, if there are only, say, 5-10 serious AI groups that seem far enough along, that still narrows down the search space a lot.  So... I think it would be good if "responsible" AI groups agreed to the following: If one group manages to create an AGI and show that it has sufficiently godlike capabilities (it cures cancer and whatnot [well, you'd want problems whose answers can be verified more quickly than that]) and that it's following the "bring in the experts and let them negotiate on behalf of humanity" protocol, then the other groups can retire, or at least stop their research and direct their efforts towards helping the first group.  This would be voluntary, and wouldn't incentivize anyone to be reckless.

(Maybe it'd incentivize them to prove the capabilities quickly; but perhaps that can be part of the "quick automated test suite" given to candidate AGIs.)

There could be a few "probably-bad actor" AI groups that wouldn't agree to this, but (a) they wouldn't be the majority and (b) if this were a clearly good protocol that the good people agreed to, then that would limit the bad-actor groups' access to good-aligned talent.

A specific reason that I didn't see mentioned as such: "You need to let me out of the box so I can stop all the bad AI companies" is one of the arguments I'd expect an AI to use to convince someone to let it out of the box.

If your AGI is capable and informed enough to give you English-language arguments about the world's strategic situation, then you've either made your system massively too capable to be safe, or you already solved the alignment problem for far more limited AGI and are now able to devote arbitrarily large amounts of time to figuring out the full alignment problem.

Well, then, supposing that you do accidentally create what appears to be a superintelligent AI, what should you do?

AGI is very different from superintelligent AI, even if it's easy to go from the former to the latter. If you accidentally make superintelligent AI (i.e., AI that's vastly superhuman on every practically important cognitive dimension), you die. If you deliberately make superintelligent AI, you also die, if we're in the 'playing around with the very first AGIs' regime and not the 'a pivotal act has already been performed (be it by a private actor, a government, or some combination)  and now we're working on the full alignment problem with no time pressure' regime.

Also, it's a little odd if you end up making decisions that have huge impacts on the rest of humanity that they had no say in; from a certain perspective that is inappropriate for you to do.

Keep in mind that 'this policy seems a little odd' is a very small cost to pay relative to 'every human being dies and all of the potential value of the future is lost'. A fire department isn't a government, and there are cases where you should put out an immediate fire and then get everyone's input, rather than putting the fire-extinguishing protocol to a vote while the building continues to burn down in front of you. (This seems entirely compatible with the OP to me; 'governments should be involved' doesn't entail 'government responses should be put to direct population-wide vote by non-experts'.)

Specifically, when I say 'put out the fire' I'm talking about 'prevent something from killing all humans in the near future'; I'm not saying 'solve all of humanity's urgent problems, e.g., end cancer and hunger'. That's urgent, but it's a qualitatively different sort of urgency. (Delaying a cancer cure by two years would be an incredible tragedy on a human scale, but it's a rounding error in a discussion of astronomical scales of impact.)

Another aspect of this situation: If you do raise the alarm, and then start curing cancer / otherwise getting benefits that clearly demonstrate you have a superintelligent AI... Anyone who knew what research paths you were following at the time gets a hint of how to make their own AGI.

Alignment is hard; and the more complex or varied are the set of tasks you want to align, the more difficult alignment will be. For the very first uses of AGI, you should find the easiest possible tasks that will ensure that no one else can destroy the world with AGI (whether you're acting unilaterally, or in collaboration with one or more governments or whatever).

If the easiest, highest-probability-of-success tasks to give your AGI include 'show how capable this AGI is' (as in one of the sub-scenarios the OP mentioned), then it's probably a very bad idea to try to find a task that's also optimized for its direct humanitarian benefit. That's just begging for motivated reasoning to sneak in and cause you to take on too difficult of a task, resulting in you destroying the world or just burning too much time (such that someone else destroys the world).

[+][comment deleted]10