Schelling points in the AGI policy space

mesaoptimizer

I've been thinking about memetically fit Schelling points in the AGI policy space. I'll describe four such "Schelling policies", and use them as pedagogical examples.

Shut it all down

MIRI's new stated objective is the clearest example of a Schelling policy: "Shut it all down". MIRI states that they want governments to coordinate to pause all AI research that involves smarter-than-human systems. Laypeople will find this policy easy to understand, since they can rely on the shared cultural knowledge of CFC bans and international nuclear disarmament as case studies.

If you want to coordinate a large number of people coherently towards furthering a particular policy, "you get about five words" that you can make 'common knowledge' such that people can coordinate in a specific direction. The ease of communicating the policy makes a big difference in such conditions.

When you attempt to communicate an idea widely, you'll notice that people usually end up with multiple slightly (or sometimes wildly) differing copies of the original idea. If you've played the Telephone game, you've experienced just how much information can be lost as an idea spreads from one person to another. In the context of policies, individual people's beliefs and incentives will warp the instantiation of the policy they will communicate and support. (For example, you'll find companies lobbying regulators to carve out exceptions that benefit them.)

Here's where Schelling points are invaluable: they serve as natural attractors in the space of ideas, and therefore enable people to 'error-correct' the idea they encounter and figure out the policy that everyone is coordinating around.

"Shut it all down" is a Schelling point. "Shut it all down if we see evidence of unprompted deception and power-seeking in AGI models" is not a Schelling point, you have multiple free variables that can and will be optimized to benefit the people spreading the idea -- which can result in a lack of coordination and the idea being outcompeted by memetically fitter ideas. "Prevent the training of models using compute greater than floating point operations" also has a free variable: why exactly $10^{25}$ floating point operations? Why not $10^{24}$ or $10^{26}$ ? Until $10^{25}$ floating point operations becomes a Schelling number, the policy containing it is not a Schelling point.

Effective Accelerationism (e/acc)

The biggest difference between e/acc and the PauseAI memeplexes is that e/acc doesn't seem to have a coherent set of goals and beliefs. Here are a bunch of memes that e/acc people tend to espouse:

"It's time to build." (also the last line of The Techno-Optimist Manifesto)
"Come and take it." (where "it" refers to GPUs here)
"Accelerate or die."

At a first glance, one might say that e/acc isn't a Schelling policy -- it seems less like a coherent policy, and more like a set of 'vibes', verbal and non-verbal statements designed to create a desired emotional impact, regardless of the actual content. I disagree.

A policy (or well, a memeplex) does not need to have an explicitly coherent set of beliefs and goals for it to result in coordinating people towards particular consequences. You might expect this to reduce the spread rate of this particular policy, but e/acc specifically compensates for it by being significantly more fun and socially, financially, and professionally profitable to coordinate around. For example, venture capital firms such as a16z want the opportunity to make a lot of money from the gold rush that is the race to AGI, and a lot of software developers want a shot at making billions of dollars if their startup succeeds. The possibility of regulations would cause the music to stop, and they don't want that.

In fact, you don't even need a high spread rate to make e/acc a viable policy. You just need it to be a reachable Schelling point in the AGI policy space, and a sufficient number of people who would stand to benefit from this policy. This is why you'll see Meta AI, Yann LeCun, Mistral, and a16z all taking actions that push the world towards less regulation over AI, and towards more opportunity for them to benefit.

A significant number of people who espouse and promote accelerationism on Twitter seem to have a very strong fear of totalitarianism, or desire to enable a libertarian future. Given their fear of governments or corporations controlling their lives in the future, either due to the creation of a totalitarian state to prevent AI existential catastrophes^[1], or due to a desire to ensure that they have a say in what the future would look like^[2], they all end up coordinating towards the most obvious Schelling point that comes to their mind: prevent centralization of power over AI development.

We must beat China

"We must beat China" is an interesting policy: the more popularity and support it gets, the more the beliefs underlying this policy start to turn into reality. Leopold's Situational Awareness series of essays is the first attempt at building popular support for this policy, that explicitly involved understanding the power of an AI system with respect to potential geopolitical consequences. I expect this to have non-trivially increased the probability that China will orient to AGI with the same frame that Leopold espouses. That is, Leopold's actions have furthered the narrative underlying his policy.

From a memetic evolution point of view, this is a pretty devious feature: the more this memeplex spreads, the more its environment adapts to fit the memeplex, instead of the other way around.^[3] "Shut it all down" and "Accelerate" both do not share this property of reshaping their epistemic environment -- just because the majority of governments agree to ban AGI research, doesn't mean that the world has changed such that people who were inclined to the "Accelerate" policy now are more amenable to the "Shut it all down" policy, ignoring social incentives. On the other hand, the more actions the US government takes to curtail China's probability of dominating in the possible race to AGI, the more likely it is that the Chinese government would consider AGI a credible threat to their continued survival, and make dominating the race a priority. A lot of "Accelerate" policy people would agree that it probably is better for the US/UK/etc. coalition to win the AGI race, even if they find any centralization of power distasteful. Therefore, they'd willingly coordinate around the "We must beat China" Schelling policy as it becomes more and more popular and China enters the race.

This makes the "We must beat China" policy a dangerous memeplex. Ideally you'd nip such policies in the bud, since the more they spread, the more difficult it is for other policies to compete, even if they had the same spread rate and starting point.

Only One Org

Here's a Schelling policy I haven't seen people talk about: I call it the "Only One Org" policy. It is the goal to establish and ensure that there exists one and only one organization allowed to do AGI research. This could occur by governments merging together all the frontier labs, or it could involve building a government lab and dismantling all the frontier labs, and hiring all the displaced ML researchers (or giving them a generous severance pay). And I expect governments to ban AGI research outside the organization. For example, I expect governments to make it illegal to train a model past a certain compute threshold, and to publish and disseminate cutting-edge research.

This Schelling policy has the same self-fulfilling beliefs property that "We must beat China" does -- the more nations agree to cooperate, the more the remaining nations are incentivized to cooperate. If China chooses to not join this international agreement, this new international coalition can unilaterally choose to enforce it, with whatever geopolitical sanctions or threats are calibrated to get them to agree. There's no need to race.

For the same reasons that I expect the "We must beat China" policy to outcompete "Shut it all down" and "Accelerate", I expect "Only One Org" to outcompete them too (given similar starting points and spread rates). This implies that if you believe that we are not on track to solving alignment, you'd still be better off by coordinating around "Only One Org" instead of around "Shut It All Down".

As far as I can tell, Nate Soares believes it unlikely that a singular international AI consortium would shut down after seeing evidence of the difficulty of AGI alignment, and this makes sense to me: normalization of deviance is a very common phenomenon in most organizations, and building a massive singular insular bureaucracy trying to unsuccessfully imitate the Manhattan project seems very likely to make this mistake.

On the other hand, once you have centralized decision-making, the number of people you have to convince to "Shut it all down" is down to low two digit numbers, or even a one digit number. A lot of people will be averse to the "Shut it all down" policy, primarily because it has massive negative consequences for the financial, social, and professional facets of their lives. They'll likely coordinate around some other Schelling policy that lets them retain the things they value. The "Only One Org" seems likely to fulfill their needs, and therefore more viable than the shutdown policy.

It seems like work in this direction has only just begun: Conjecture published the MAGIC proposal last year, and it is the only write-up I've encountered that fleshes out this policy proposal.

Parting words

I believe that Schelling policies is the most viable class of policies to coordinate around.

It is likely that there are more such Schelling points in the AGI policy space. I expect that there is likely at least one Schelling policy that involves AGIs being given rights or treated as citizens, that would be at least as useful as the shutdown and the one-org Schelling policies described here.

Finding such a Schelling policy is left as an exercise for the reader.

John Carmack and Yann LeCunn come to mind, based on their tweets that I recall. ↩︎
Sam Altman is a good example:

“We’re planning a way to allow wide swaths of the world to elect representatives to a new governance board,” [Sam Altman] said. “Because if I weren’t in on this I’d be, like, Why do these fuckers get to decide what happens to me?”

↩︎
This seems like a feature of race dynamics in general, and probably of all instances of the class of self-fulfilling beliefs. ↩︎

[-]Seth Herd2y40

I think this is a highly useful framing. I'm not sure if Schelling points or memetic replication is a better frame, but they point to the same conclusions about viable options in public opinion (and probably politician opinion, although they might actually get more nuance with the aid of advisors as the situation becomes more urgent and clear).

The suggestion of Only One Org is something I've thought about but haven't refined. I hope it's viable.

We could think of it as a huge treasure with a curse or a bomb on it that might kill everyone.

The suggestions are:

Nobody gets near that dangerous treasure (Shut It Down)
Free for all. Everyone goes for the treasure and carries off what they can (E/Acc)
- (or "Advance, It's Always Worked Before" might be the simple Schelling point you're pointing at)
- Maybe the bomb isn't real, nobody can make an airtight argument that it is
- If anybody gets more, they'll distribute it by paying everyone else in town to do stuff for them.
- We hope.
The best prepared, best armed guy heads out now and tries to get past that bomb and out with the treasure before anyone else gets equipped enough to follow him. (US races to stay in the lead).
- He'll promise to split it up after he gets it, to keep people from following him,
  - but nobody knows how to hold him to it.
Just one guy, chosen by committee, tries to disarm the bomb and grab the treasure (Only One Org).
- That guy has agreed to split the treasure up fairly.
  - The treasure is so huge that any splitting will probably make everyone happy.
- So maybe nobody will grab it all from that one guy after he's defused the bomb and gotten it.
- We hope.

[-]Mateusz Bagiński2y10

The Schelling-point-ness of these memes seems to me to be secondary to (all inter-related):

memetic fit (within a certain demographic, conditional on the person/group already adopting certain beliefs/attitudes/norms etc)
being a self-correcting/stable attractor in the memespace
being easy to communicate and hold all the relevant parts in your mind at once

You discuss all of that but I read the post as saying something like "we need Schelling points, therefore we have to produce memetic attractors to serve as such Schelling points", whereas I think that typically first a memeplex emerges, and then people start coordinating around it without much reflection. (Well, arguably this is true of most Schelling points.)

Here's one more idea that I think I've seen mentioned somewhere and so far hasn't spread but might become a Schelling point

AI summer - AI has brought a lot of possibilities but the road further ahead is fraught with risks. Let's therefore pause fundamental research and focus on reaping the benefits of the state of AI that we already have.

LESSWRONG
LW