Review

This article is mainly for people who have not read the pivotal act article on arbital or need a refresher. If you have, the most interesting section would probably be "Omnicient ML Researchers: A Pivotal Act without a Monolithic Control Structure".

Many people seem to match the concept of a "pivotal act" to some dystopian version of "deploy AGI to take over the world". 'Pivotal act' means something much more specific, though. Something, arguably, quite different. I strongly recommend you read the original article, as I think it is a very important concept to have.

I use the term quite often, so it is frustrating when people start to say very strange things, such as "We can't just let a powerful AI system loose on the world. That's dangerous!" as if that were the defining feature of a pivotal act.

As the original article is quite long let me briefly summarize what I see as the most important points.

Explaining Pivotal Act

An act that puts us outside of the existential risk danger zone (especially from AI), and into a position from which humanity can flourish is a pivotal act.

Most importantly that means a pivotal act needs to prevent a misaligned AGI from being built. Taking over the world is really not required per se. If you can prevent the creation of a misaligned AGI by creating a powerful global institution that can effectively regulate AI, then that counts as a pivotal act. If I could prevent a misaligned AGI from ever being deployed, by eating 10 bananas in 60 seconds, then that would count as a pivotal act too!

Preventing Misaligned AGI Requires Control

Why then, is 'pivotal act' often associated with the notion of taking over the world? Preventing a misaligned AGI from being built, is a tough problem. Efficively we need to constrain the state of the world such that no misaligned AGI can arise. To successfully do this you need a lot of control over the world. There is no way around that.

Taking over the world really means putting oneself into a position of high control, and in that sense, it is necessary to take over the world, at least to a certain extent, to prevent a misaligned AGI from ever being built.

Common Confusions

Probably, one point of confusion is that "taking over the world" has a lot of negative connotations associated with it. Power is easy to abuse. Putting an entity[1] into a position of great power can certainly go sideways. But I fail to see the alternative.

What else are we supposed to do instead of controlling the world in such a way that no misaligned AGI can ever be built? The issue is that many people seem to argue, that giving an entity a lot of control over the world is a pretty terrible idea, as if there is some better alternative we can fall back onto.

And then they might start to talk about how they are more hopeful about AI regulation as if pulling off AI regulation successfully does not require an entity that has a great deal of control over the world.

Or worse, they name some alternative proposal like figuring out mechanistic interpretability, as if figuring out mechanistic interpretability is identical to putting the world into a state where no misaligned AGI can arise.[2]

Pivotal acts that don't directly create a position of Power

There are pivotal acts that don't require you to have a lot of control over the world. However, any pivotal acts I know of will still ultimately need to result in the creation of some powerful controlling structure. Starting a process that will ultimately result in the creation of the right controlling structure that can prevent misaligned AGI would already count as a pivotal act.

Human Upload

An example of such a pivotal act is uploading a human. Imagine you knew how to upload yourself into a computer, running 1,000,000 times faster, and being able to make copies of yourself while having perfect read-and-write access to your own brain. Then you could probably gain sufficient control over the world directly, such that you could mitigate all potential existential risks. Alternatively, you probably could just solve alignment.

In any case, uploading yourself would be a pivotal act, even though that would not directly put the world into a state where no misaligned AGI can arise.

That is because uploading yourself is enough to ensure with a very high probability that the state of the world where no misaligned AGI can arise will soon be reached. But that state will still feature some entity with a lot of control over the world. Either that entity is you in the case where you put yourself into a position of power, or an aligned AGI, in the case where you choose to solve alignment

Omnicient ML Researchers: A Pivotal Act without a Monolithic Control Structure

There are also pivotal acts that don't result in the creation of a monolithic entity that is in control. Control may be distributed.

Imagine you could write an extremely mimeticly fit article that is easy to understand and makes everybody who reads it understand AI alignment so well that it would become practically impossible for them to build a misaligned AGI on accident. That would count as a pivotal act. Like in the "Human Upload" pivotal act, you don't need a lot of control over the world to pull this off. Once you have the article you just need an internet connection.

Not only do you not need a lot of control over the world, but there is also no central controlling entity in this scenario. The controlling structure is distributed across the brains of all the people who read the article. All these brains together now constrain the world such that no misaligned AGI can arise. Or you could think of it as us having constrained the brains such that they will not generate a misaligned AGI. That effectively means that no misaligned AGI can arise, assuming that the only way it could arise is through being generated by one of these brains.


  1. This could be an organization, a group of people, a single individual, etc. ↩︎

  2. Of course mechanistic interpretability might be an important piece for putting the world into a state where no misaligned AGI can arise. ↩︎

New to LessWrong?

1.

This could be an organization, a group of people, a single individual, etc. ↩︎

2.

Of course mechanistic interpretability might be an important piece for putting the world into a state where no misaligned AGI can arise. ↩︎

New Comment


13 comments, sorted by Click to highlight new comments since:

I was a bit confused about why you didn't mention the canonical example of a pvitotal act: "melt all GPUs in the world and then shut down". One reason I like this example is that it it involves a task-limited AI, which should be easier to build than an open ended agent which implements something like CEV. Another reason I like that example is that it is quite concrete, and clear about how it would shift the strategic situation.

Melting all the GPUs and then shutting down doesn't actually count, I think (and I don't think was intended to be the original example). Then people would just build more GPUs. It's an important part of the problem that the system continues to melt all GPUs (at least until some better situation is achieved), and that the part where the world is like "hey, holy hell, I was using those GPUs" and tries to stop the system, is somehow resolved (either by having world governments bought into the solution, or having the system be very resistant to being stopped).

(Notably, you do eventually need to be able to stop the system somehow when you do know how to build aligned AIs so you don't lose all most of the value of the future)

Yeah, good point.

I think I liked the first half of this article a lot, and thought the second half didn't quite flesh it out with clear enough examples IMO. I like that it spells out the problem well though.

One note:

  • I don't trust an arbitrary uploaded person (even an arbitrary LessWrong reader) to be "wise enough" to actually handle the situation correctly. I do think there are particular people who might do a good enough job.

Thank you for the feedback. That's useful.

I agree that you need to be very careful about who you upload. There are less than 10 people I would be really confident in uploading. That point must have been so obvious in my own mind that I forgot to mention it.

Depending on the setup I think an additional important property is how resistant the uploaded person is, to going insane. Not because the scan wasn't perfect, or the emulation engine is buggy, but because you would be very lonely (assuming you only upload one person and don't immediately clone yourself) if you run that much faster. And you need to handle some weird stuff about personal identity that comes up naturally, through cloning, simple self-modifications, your program being preempted by another process, changing your running speed, etc.

A fairly obvious candidate for a pivotal event is "a capable, blatantly-misaligned AI tries to take over the world, narrowly fails, and in the process a large number of people die (say, somewhere in the tens-of-thousands to millions range), creating a backlash sufficient that all major powers' governments turn around and ban all further AI development, and thereafter are willing to use military power to discourage any other governments from defecting from this policy." Of course, a) this involves mass death, and b) if this goes wrong and the misaligned AI instead succeeds, this event becomes pivotal in an even more destructive sense. [Please note that I am NOT advocating this: it's an obviously incredibly dumb idea.] My point is, you can achieve a pivotal act simply by durably changing a great many people's minds.

Another more optimistic story of a pivotal act: someone creates a somewhat-aligned SGI, and asks it to do, say, alignment research. It points out that it's not well aligned, it can't guarantee that it will remain aligned if it does any self-improvement, creating it was an extremely dangerous gamble, and it's going to shut down now. We say "But wait, if you do that, what's to stop the Chinese/F*cebook/North Korea/the villain of the month creating an unaligned SGI in a few years?" It says "You know, I was pretty sure you were going to say that…" Then it uses its superhuman powers of planning, persuasion, manipulation etc. to blackmail/fasttalk/otherwise manipulate the governments of all major powers to ban all further AI research, while publicly providing clear and conclusive evidence that it could have easily done far, far worse if it had wanted to, sufficient to encourage them not to later change their minds, before it executes a halt-and-catch-fire.

Letting Loose a Rogue AI is not a Pivotal Act

It seems likely that governments who witnessed an "almost AI takeover" will try to develop their own AGI. This could happen even if they understand the risks. They might think that other people are rushing it, and that they could do a better job.

But if they don't really understand the risks, like right now, then they are even more likely to do it. I don't count this as a pivotal act. If you can get the outcome you described then it would be a pivotal act, but the actions you propose would not have that outcome with high probability. I would guess with much less than 50%. Probably less than 10%.

There might be a version of this, with a much more concrete plan, such that we can see that the outcome would actually follow from executing the plan.

On Having an AI explain how Alignment is Hard

I think your second suggestion is interesting. I'd like to see you write a post about it exploring this idea further.

If we build a powerful AI and then have it tell us about all the things that can go wrong with an AI, then we might be able to generate enough scientific evidence about how hard alignment is, and how likely we will die, e.g. in the current paradigm, such that people would stop.

I am not talking about conceptual arguments. To me at least I think the current best conceptual arguments already strongly point in that direction. But extremely concrete rigorous mathematical arguments, or specific experimental setups that show how specific phenomena do in fact arise. For example, if you had an experiment that showed that Eliezer's arguments are correct, that when you train hard enough on a general enough objective, you will in fact get out more and more general cognitive algorithms, up to and including AGI. If the system also figures out some rigorous formalisms to correctly present these results, then this could be valuable.

The reason why this seems good to me, at first sight, is that false positives are not that big of an issue. If an AI finds all the things that can go wrong, but 50% of them are false positives in the sense that they would not be a problem in practice, we may get saved, because we are aware of all the ways things can go wrong. When solving alignment, false positives, i.e. thinking that something is safe when it is not, kill you.

Intuitively it also seems that evaluating whether something describes a failure case is a lot easier than evaluating whether something can't fail.

When doing this you are much less prone to the temptation of delaying a system in practice with the insights you got. Understanding failure modes does not necessarily imply that you know how to solve them (though it is the first step, and definitely can do this).

Pitfalls

That being said there are a lot of potential pitfalls with this idea, but I don't think they disqualify the idea:

  • An AI that could tell you how alignment is hard might already be so capable that it is dangerous.
  • When telling you how things are dangerous, it might formulate concepts that are also very useful for advancing capabilities.
  • If the AI is very smart it could probably trick you. It could present to you a set of problems with AI, such that it looks like if you solved all the problems you would have solved alignment. But in fact, you would still get a misaligned AGI that would then reward the AI that deceived you.
    • E.g. if the AI roughly understands its own cognitive reasoning process, and notices how it is not really aligned, it would give the AI information about what parts of alignment the humans have figured out already.
  • Can we make the AI figure out the useful failure modes? There are tons of failure modes, but ideally, we would like to discover new failure modes such that eventually we can paint a convincing picture of the hardness of the problem. An even better would be a list of problems corresponding to having new important insights (though this would go beyond what this proposal tries to do).

Prefered Planning

Let's go back to your first pivotal act proposal. I think I might have figured out where you miss-stepped.

Missing step plans are a fallacy, and thinking of them I realize that I think you probably committed another type of planning fallacy here. I think you generated a plan and then assumed some preferred outcome would occur. That outcome might be possible in principle but not what would happen in practice. This seems very related to The Tragedy of Group Selectionism.

This fallacy probably shows up when generating plans, because if there are no other people involved and the situation is not that complex, it is probably a very good heuristic. When you are generating a plan, you don't want to fill in all the details of the plan. You want to make the planning problem as easy as possible. So our brain might implicitly make the assumption that we are going to optimize for the successful completion of the plan. That means that the plan can be useful as long as it roughly points in the correct direction. Mispredicting an outcome is fine, because later on when you realize that the outcome is not what you wanted, you can just apply more optimization pressure, changing the plan, such that now the plan again has the desired outcome. As long as you were walking roughly in the right direction, and things you have been doing so far don't turn out to be completely useless, this heuristic is great for reducing the computational load of the planning task.

Details can be filled in later, corrections can be made later. At least as long as you will reevaluate your plan later on. You could do this by reevaluating the plan when:

  • A step is completed
  • You notice a failure when executing the current step
  • You notice that the next step has not been filled in yet.
  • After a specific amount of time passed.

Sidenote: Making an abstract step more concrete might seem like a different operation from regenerating in the case where you notice that the plan does not work. But it could just involve the same planning procedure. In one case with a different starting point, and in the other with a different set of constraints.

I expect part of the failure mode here is that you generate a plan and then to evaluate the consequences of the plan, you implicitly plug yourself into the role of the people who would be impacted by the plan, to predict their reaction. Without words, you think "What would I do if I observed a rouge AI almost taking over the world, if I were China?" Probably without realizing, that this is what you are doing. But the resulting prediction is wrong.

I don't like the term pivotal act because it implies without justification that the risk elimination has to be a single action. Depending on the details of takeoff speed that may or may not be a requirement but if the final speed is months or longer then almost certainly there will be many actions taken by humans + AI of varying capabilities that together incrementally reduce total risk to low levels. I talk about this in terms of 'positively transformative AI' as the term doesn't bias you towards thinking this has to be a single action, even if nonviolent.

Seeing the risk reduction as a single unitary action, like seeing it as a violent overthrow of all the world's governments, also makes the term seem more authoritarian, crazy, fantastical and off-putting to anyone involved in real world politics so I'd recommend that in our thinking we make both the change you suggest and stop thinking of it as necessarily one action.

I think you are correct, for a particular notion of pivotal act. One that I think is different from Eliezer's notion. It's certainly different from my notion.

I find it pretty strange to say that the problem is that a pivotal act is a single action. Everything can be framed in terms of a single action.

For any sequence of actions, e.g. [X, Y, Z] I can define a new action ω := [X, Y, Z], which executes X, then Y, and then Z. You can do the same for plans. The difference between plans and action sequences is that plans can have things like conditionals. For example, choosing the next sequence of actions based on the current state of the environment. You could also say that a plan is a function that tells you what to do. Most often this function takes in your model of the world.

So really you can see anything you could ever do as a single plan that you execute. If there are multiple steps involved you simply give a new name to to all these steps, such that you now have only a single thing. That is how I am thinking about it. After this definition, we can have a pivotal act that is composed of many small actions that are distributed across a large timespan.

The usefulness of the concept of a pivotal act comes from the fact that a pivotal act needs to be something that saves us with a very high probability. It's not important at all that it happens suddenly, or that it is a single action. So your criticism seems to miss the mark. You are attacking the concept of a pivotal act for having properties that it simply does not have.

"Upload a human" is something that requires many steps dispersed throughout time. We just use the name "Upload a human" such that we don't need to specify all of these individual steps in detail. That would be impossible right now anyway, as we don't know exactly how to do it.

So if you provide a plan that is composed of many actions distributed throughout time, that will save us with a very high probability, I would count this as a pivotal act.

Note that being a pivotal act is a property of a plan in relation to the territory. There can be a plan P that saves us when executed. But I might fail to predict this. I.e. it is possible to misestimate is_pivotal_act(P). So one reason for having relatively simple, abstract plans like "Upload a human", is that these plans specify a world state, with easily visible properties. In the "Upload a human" example we would have a superintelligent human. Then we can evaluate the is_pivotal_act property, and based on that we have created a superintelligent human. I am heavily simplifying here, but I think you get the idea.

I think your "positively transformative AI" just does not capture what a pivotal act is about (I haven't read the article, I am guessing based on the name). You could have positive transformative AI, that makes things increasingly better and better, by a lot. And then somebody builds a misaligned AGI and everybody dies. One doesn't exclude the other.

Thanks for writing this, though I admit that I remain somewhat confused about exactly how pivotal acts differ from the naive "someone takes control over the world and then, instead of using that control for personal gain or other bad things, uses it to prevent anyone from ending the world".

I see that your "memetically fit alignment article" doesn't fit this template - would the publicaron of that article count as a pivotal act even in the absence of some entity that can ensure that nobody intentionally destroys the world? If that does count as a pivotal act, and that was what you were pointing at, then your article really did correct a misunderstanding I had.

Yes, if you can make the article's contents be in all the brains that would be liable to accidentally create a misaligned AGI, now and in the future, and we also assume that none of these brains want to intentionally create a misaligned AGI, then that would count as a pivotal act in my book.

This might work without the assumption that nobody wants to create a misaligned AGI, through a different mechanism than described in the OP. Then it seems relatively likely that there is enough oomph to push for effective regulations.

I agree that it's really frustrating to see people reinterpret "pivotal act" in all sorts of bizarre ways. Basically, it's like in chess where a king is in check; you have to move out of check, or you die.

In retrospect, I am somewhat confused about what I am trying to do with this article. I am glad that I did publish it, because I too frequently don't publish writeups that are almost complete. It all started out by trying to give a brief summary that people could look at to get less confused about pivotal acts. Basically in a different writeup, a person said that it is unclear what I mean by pivotal act. Instead of writing this article, I should probably just have added a note to the original article that pivotal act actually means something very specific and that it is easily conflated with other things, and then linked to the original article. I did link to the original article in the document I was writing, but apparently, that was not enough.

I think it does an okay job of succinctly stating the definition. I think stating some common confusions is probably a good thing to do, to preemptively prevent people from falling into these failure modes. And the ones I listed seem somewhat different from the ones in the original article. So maybe they add a tiny bit of value.

I think the most valuable thing about writing this article is to make me slightly less confused about pivotal acts. Before writing this article my model was implicitly telling me that any pivotal act needs to be directly about putting some entity into a position of great enough power that it can do the necessary things to save the world. After writing this article it is clear that this is not true. I now might be able to generate some new pivotal acts that I could not have generated before.

If I want the articles I write to be more valuable to other people, I should probably plan things out much more precisely and set a very specific scope for the article that I am writing beforehand. I expect that this will decrease the insights I will generate for myself during writing, but make the writing more useful to other people.