This post is written in a spirit of constructive criticism. It's phrased fairly abstractly, in part because it's a sensitive topic, but I welcome critiques and comments below. The post is structured in terms of three claims about the strategic dynamics of AI safety efforts; my main intention is to raise awareness of these dynamics, rather than advocate for any particular response to them. Disclaimer: I work at OpenAI, although this is a personal post that was not reviewed by OpenAI.
Claim 1: The AI safety community is structurally power-seeking.
By “structurally power-seeking” I mean: tends to take actions which significantly increase its power. This does not imply that people in the AI safety community are selfish or power-hungry; or even that these strategies are misguided. Taking the right actions for the right reasons often involves accumulating some amount of power. However, from the perspective of an external observer, it’s difficult to know how much to trust stated motivations, especially when they often lead to the same outcomes as self-interested power-seeking.
Some prominent examples of structural power-seeking include:
- Trying to raise a lot of money.
- Trying to gain influence within governments, corporations, etc.
- Trying to control the ways in which AI values are shaped.
- Favoring people who are concerned about AI risk for jobs and grants.
- Trying to ensure non-release of information (e.g. research, model weights, etc).
- Trying to recruit (high school and college) students.
To be clear, you can’t get anything done without being structurally power-seeking to some extent. However, I do think that the AI safety community is more structurally power-seeking than other analogous communities, such as most other advocacy groups. Some reasons for this disparity include:
- The AI safety community is more consequentialist and more focused on effectiveness than most other communities. When reasoning on a top-down basis, seeking power is an obvious strategy for achieving one’s desired consequences (but can be aversive to deontologists or virtue ethicists).
- The AI safety community feels a stronger sense of urgency and responsibility than most other communities. Many in the community believe that the rest of the world won’t take action until it’s too late; and that it’s necessary to have a centralized plan.
- The AI safety community is more focused on elites with homogeneous motivations than most other communities. In part this is because it’s newer than (e.g.) the environmentalist movement; in part it’s because the risks involved are more abstract; in part it’s a founder effect.
Again, these are intended as descriptions rather than judgments. Traits like urgency, consequentialism, etc, are often appropriate. But the fact that the AI safety community is structurally power-seeking to an unusual degree makes it important to grapple with another point:
Claim 2: The world has strong defense mechanisms against (structural) power-seeking.
In general, we should think of the wider world as being very cautious about perceived attempts to gain power; and we should expect that such attempts will often encounter backlash. In the context of AI safety, some types of backlash have included:
- Strong public criticism of not releasing models publicly.
- Strong public criticism of centralized funding (e.g. billionaire philanthropy).
- Various journalism campaigns taking a “conspiratorial” angle on AI safety.
- Strong criticism from the AI ethics community about “whose values” AIs will be aligned to.
- The development of an accelerationist movement focused on open-source AI.
These defense mechanisms often apply regardless of stated motivations. That is, even if there are good arguments for a particular policy, people will often look at the net effect on overall power balance when judging it. This is a useful strategy in a world where arguments are often post-hoc justifications for power-seeking behavior.
To be clear, it’s not necessary to avoid these defense mechanisms at all costs. It’s easy to overrate the effect of negative publicity; and attempts to avoid that publicity are often more costly than the publicity itself. But reputational costs do accumulate over time, and also contribute to a tribalist mindset of “us vs them” (as seen most notably in the open-source debate) which makes truth-seeking harder.
Note that most big companies (especially AGI companies) are strongly structurally power-seeking too, and this is a big reason why society at large is so skeptical of and hostile to them. I focused on AI safety in this post both because companies being power-seeking is an idea that's mostly "priced in", and because I think that these ideas are still useful even when dealing with other power-seeking actors.
Claim 3: The variance of (structurally) power-seeking strategies will continue to increase.
Those who currently take AGI and ASI seriously have opportunities to make investments (of money, time, social capital, etc) which will lead to much more power in the future if AI continues to become a much, much bigger deal.
But increasing attention to AI will also lead to increasingly high-stakes power struggles over who gets to control it. So far, we’ve seen relatively few such power struggles because people don’t believe that control over AI is an important type of power. That will change. To some extent this has already happened (with AI safety advocates being involved in the foundation of three leading AGI labs) but as power struggles become larger-scale, more people who are extremely good at winning them will become involved. That makes AI safety strategies which require power-seeking more difficult to carry out successfully.
How can we mitigate this issue? Two things come to mind. Firstly, focusing more on legitimacy. Work that focuses on informing the public, or creating mechanisms to ensure that power doesn’t become too concentrated even in the face of AGI, is much less likely to be perceived as power-seeking.
Secondly, prioritizing competence. Ultimately, humanity is mostly in the same boat: we're the incumbents who face displacement by AGI. Right now, many people are making predictable mistakes because they don't yet take AGI very seriously. We should expect this effect to decrease over time, as AGI capabilities and risks become less speculative. This consideration makes it less important that decision-makers are currently concerned about AI risk, and more important that they're broadly competent, and capable of responding sensibly to confusing and stressful situations, which will become increasingly common as the AI revolution speeds up.
EDIT: A third thing, which may be the most important takeaway in practice: the mindset that it's your job to "ensure" that things go well, or come up with a plan that's "sufficient" for things to go well, inherently biases you towards trying to control other people—because otherwise they might be unreasonable enough to screw up your plan. But trying to control others will very likely backfire for all the reasons laid out above. Worse, it might get you stuck in a self-reinforcing negative loop: the more things backfire, the more worried you are, and so the more control you try to gain, causing further backfiring... So you shouldn't be in that mindset unless you're literally the US President (and maybe not even then). Instead, your job is to make contributions such that, if the wider world cooperates with you, then things are more likely to go well. AI safety is in the fortunate position that, as AI capabilities steadily grow, more and more people will become worried enough to join our coalition. Let's not screw that up.
All of that sounds right to me. But this pivot with regards to means isn't much evidence about what Eliezer/MIRI would do if they (as a magical hypothetical) suddenly found themselves with a verifiably-aligned CEV AGI.
I expect that they would turn it on, with the expectation that it would develop a hard power decisive strategic advantage, use that to end the acute risk period, and then proceed to optimize the universe.
Insofar as that's true, I think Oliver's statement above...
...is inaccurate.
MIRI has never said, to my knowledge,
The Singularity Institute used to have the plan of building and deploying a friendly AI, which they expected to "optimize" the whole world.
Eliezer's writing includes lots of point in which he at least hints (some would say more than hints), that he thinks that it is morally obligatory or at least virtuous, to take over the world for the side of Good.
Famously, Harry says "World Domination is such an ugly phrase. I prefer world optimization." (We made t-shirts of this phrase!)
The Sword of Good ends with the line
“'I don’t trust you either,' Hirou whispered, 'but I don’t expect there’s anyone better,' and he closed his eyes until the end of the world." He's concluded that all the evil in the world must be opposed, that it's right for someone to cast the "spell of ultimate power" to do that.
(This is made a bit murky, because Eliezer's writings usually focus on the transhumanist conquest of the evils of nature, rather than political triumph over human-evil. But triumph over the human-evils is definitely included eg the moral importance and urgency of destroying Azkaban, in HP:MoR.)
From all that, I think it is reasonable to think that MIRI is in favor of taking over the world, if they could get the power to do it!
So it seems disingenuous, to me, to say,
I agree that
For an outsider who doesn't already trust the CEV process, this is about as reassuring as a communist group saying "we don't care who implements the AI, as long as they properly align it to Marxist doctrine. I understand how CEV is more meta than that, how it is explicitly avoiding coding object level values into the AI. But not everyone will see it that way, especially if they think the output of CEV is contrary to their values (as indeed, virtually every existing group should.))
But it still seems to me that MIRI's culture endorses sufficiently trustworthy people taking unilateral action, both to do a pivotal act and end the acute risk period, and more generally, to unleash a process that will inexorably optimize the world for Good.