Review

All the smart people agitating for a 6-month moratorium on AGI research seem to have unaccountably lost their ability to do elementary game theory.  It'a a faulty idea regardless of what probability we assign to AI catastrophe.

Our planet is full of groups of power-seekers competing against each other. Each one of them could cooperate (join in the moratorium) defect (publicly refuse) or stealth-defect (proclaim that they're cooperating while stealthily defecting). The call for a moratorium amounts to saying to every one of those groups "you should choose to lose power relative to those who stealth-defect". It doesn't take much decision theory to predict that the result will be a covert arms race conducted in a climate of fear by the most secretive and paranoid among the power groups.

The actual effect of a moratorium, then, would be not be to prevent super-AGI; indeed it is doubtful development would even slow down much, because many of the power-seeking groups can sustain large research budgets due to past success. If there's some kind of threshold beyond which AGI immediately becomes an X-risk, we'll get there anyway simply due to power competition.  The only effect of any moratorium will be to ensure that (a) the public has no idea what's going on in the stealth-defectors' labs, and (b) control of the most potent AIs will most likely be achieved first by the most secretive and paranoid of power-seekers.

A related problem is that we don't have a college of disinterested angels to exert monopoly control of AI, or even to just trust to write its alignment rules. Pournelle's Law. "Any bureaucracy eventually comes to serve its own interests rather than those it was created to help with," applies; monopoly controllers of AI will be, or will become, power-seekers themselves. And there is no more perfect rationale for totalitarian control of speech and action than "we must prevent anyone from ever building an AI that could destroy the world!"  The entirely predictable result is that even if the monopolists can evade AGI catastrophe (and it's not clear they could) the technology becomes a boot stomping on humanity's face forever.

Moratorium won't work.  Monopoly won't either.  Freedom and transparency might. In this context, "Freedom" means "Nobody gets to control the process of AI development," and "transparency" means "All code and training sets are open, and attempting to conceal your development process is interpreted as a crime - an act of aggression against humanity in the future". Ill-intentioned people will still try to get away with concealment, but the open-source community has proven many times that isolating development behind a secrecy wall means you tend to slow down and make more persistant mistakes than the competing public community does.

Freedom and transparency now would also mean we don't end up pre-emptively sacrificing every prospect of a non-miserable future in order to head off a catastrophe that might never occur.

(This is a slightly revised version of a comment I posted a few hours ago on Scott Aaronson's blog.)
 

New to LessWrong?

New Comment


26 comments, sorted by Click to highlight new comments since:

All code and training sets are open, and attempting to conceal your development process is interpreted as a crime - an act of aggression against humanity in the future.

This is also a form of regulation, motivating stealth defection. All regulation needs some kind of transparency, at least to the auditors. Compute footprint targeting would require sufficient disclosure of the process to verify the footprint.

I think the claim is that a ban would give an advantage to stealth defection because the stealth defector would work faster than people who can't work at all, while a regulation requiring open sharing of research would make stealth defection a disadvantage because the stealth defector has to work alone and in secret while everyone else collaborates openly.

I think it depends, since you could have a situation where a stealth defector knows something secret and can combine it with other people's public research, but it would also be hard for someone to get ahead in the first place while working alone/in secret.

Agreed, this would make it super easy to front-run you.

Other comments here have made the case that freedom and transparency, interpreted straightforwardly, probably just make AGI happen sooner and be less safe. Sadly, I agree. The imprint of open-source values in my mind wants to apply itself to this scenario, and finds it appealing to be in a world where that application would work. But I don't think that's the world we're currently living in.

In a better world, there would be a strategy that looks more like representative democracy: large GPU clusters are tightly controlled, not by corporations or by countries, but by a process that only allows training runs that are approved by a supermajority of representatives, representing the whole world and selected via the best speculative voting theory tech we can muster, to make them be genuine experts and good people rather than conventional politicians. I don't see any feasible path from here to there. If we somehow had a century to prepare, I think the goal would be to arrange society into a shape where that could happen.

-- jimrandomh
   (A name, chosen long ago, inspired by Appendix B)

The standard argument against open sourcing intermediate AI developments to the public is that it reduces the amount of lead time available to solve the inevitable safety problems that are going to crop up the first time a research lab finds their way to AGI. If, on the other hand, you have an architecture that's a precursor to AGI, and you open source it, that means you must now develop a working implementation and fix all of the safety problems before FAIR is even able to do the first thing, which is probably impossible.

I would love to be the person who is on the side of "freedom" and "transparency", but the standard argument seems pretty straightforwardly correct, at least to me.

I personally prefer taking a gamble on freedom instead of the certainty of a totalitarian regime.

I personally prefer taking a gamble on freedom instead of the certainty of a totalitarian regime.

This seems wrong. Here's an incomplete list of reasons why:

  1. If the 3 leading labs join the moratorium and AGI is stealthily developed by the 4th, then the arrival of AGI will in fact have been slowed by the lead time of the first 3 labs + the slowdown that the 4th incurs by working in secret.
  2. The point of this particular call for a 6-month moratorium is not to particularly slow down anyone (and as has been pointed out by others, it is possible that OpenAI wasn't even planning to start training GPT-5 in the next few months). Rather, the point is to form a coalition to support future policies, e.g. a government-supported moratorium.
  3. It is actually fairly hard to build compute clusters in secret, because you can just track what comes out of the chip fabs and where it goes
  4. While not straightforward, it's also feasible to monitor existing clusters, see e.g. https://arxiv.org/abs/2303.11341

I agree with Eliezer that this would most likely be "suicide". Open-sourcing code would mean that the bad actors gain access to powerful AI systems immediately upon development (or sooner if they decide to front-run). At least with the current system, corporations are able to test models before release, determine their capabilities and prepare society for what's coming up further ahead. It also provides the option to not release if we decide that the risks are too great.

I agree with Vladimir's point that whilst you say everyone supporting the moratorium has "unaccountably lost their ability to do elementary game theory", you seem to have not applied this lens yourself. I'd suggest asking yourself why you weren't able to see this. In my experience, this has often been the case when I have a strong pre-existing belief and this makes it hard to see any potential flaws with this perspective unless I really make myself look.

"Moratorium won't work.  Monopoly won't either. Freedom and transparency might." - the word "might" is doing a lot of work here. You've vaguely gestured in a particular direction, but not really filled in the details. I think if you attempted to do that, you'd see that it's hard to fill in the concrete details so that they work.

Lastly, this misses what I see as the crux of the issue, which is the offense-defense balance. I think advanced AI systems will heavily favour the attacker given that you only need, for example, one security flaw to completely compromise your opponent's system. If this is the case, then everyone being at around about the same capability level won't really help.

Of course the word "might" is doing a lot of work here!  Because there is no guaranteed happy solution, the best we can do is steer away from futures we absolutely know we we do not want to be in, like a grinding totalitarianism rationalized by "We're saving you from the looming threat of killer AIs!"

" At least with the current system, corporations are able to test models before release".  The history of proprietary software does not inspire any confidence at all that this will be done adequately, or even at all; in a fight between time-to-market and software quality, getting their firstest almost always wins.  It's not reasonable to expect this to change simply because some people have strong opinions about AI risk.

OpenAI seems to have held off on the deployment of GPT4 for a number of months. They also brought on ARC evals and a bunch of experts to help evaluate the risks of releasing the model.

I think accepting or rejecting the moratorium has nothing to do with game theory at all. It's purely a question of understanding.

Think of it this way. Imagine you're pushing a bus full of children, including your own child, toward a precipice. And you're paid for each step. Why on Earth would you say "oh no, I'll keep pushing, because otherwise other people will get money and power instead of me"? It's not like other people will profit by that money and power! If they keep pushing, their kids will die too, along with everyone else's! The only thing that keeps you pushing the bus is your lack of understanding, not any game theory considerations. Anyone with a clear understanding should just stop pushing the frigging bus.

Every time you move the bus 1cm further forward you get paid $10000. The precipice isn't actually visible, it's behind a bank of fog; you think it's probably real but don't know for sure. There are 20 other people helping you push the bus, and they also get paid. All appearances suggest that most of the other bus-pushers believe there is no precipice. One person is enough to keep the bus moving; even if 20 people stop pushing and only one continues, if the precipice is real the bus still falls, just a bit later. It's probably possible to pretend you've stopped pushing while in fact continuing to push, without everyone else knowing you're doing that.

I'm not sure whether this is exactly "game theory", but it's certainly a situation where you need to take into account the other people, and it's certainly not as simple as "you have to stop pushing the bus because otherwise the children will die" since plausibly those children are going to die anyway.

All appearances suggest that most of the other bus-pushers believe there is no precipice.

Those of them who believe it based on vibes may switch beliefs when a moratorium goes into effect.

Yes, I think one of the biggest benefits of stopping pushing the bus is in fact the signalling effect.

It's a simplification certainly. But the metaphor kinda holds up - if you know the precipice is real, the right thing is still to stop and try to explain to others that the precipice is real, maybe using your stopping as a costly signal. Right now the big players can send such a signal, if top researchers say they've paused working, no new products are released publicly and so on. And maybe if enough players get on board with this, they can drag the rest along by social pressure, divestment or legislation. The important thing is to start, I just made a post about this.

[+][comment deleted]20

(I haven't yet read the things this is responding to.)

Doing AI research as it's currently done in secret seems difficult, especially if it's "secret because we said we'd stop but we lied" rather than "secret because we're temporarily in stealth mode".

You need to employ people who a) can do the job and b) you trust not to leak that you're employing them to do a thing you've publicly said you wouldn't do. You can't easily ask if you can trust them because if the answer is no they already know too much.

Notably, I don't think this is a case of "you can just have a small handful of people doing this while no one else at the company has any idea". Though admittedly I don't know the size of current AI research teams.

The thing that comes to mind to me as a plausible counterexample is the VW emissions scandal, I don't know (and can't immediately see on Wikipedia) how many people were involved in that.

Our planet is full of groups of power-seekers competing against each other. Each one of them could cooperate (join in the moratorium) defect (publicly refuse) or stealth-defect (proclaim that they're cooperating while stealthily defecting). The call for a moratorium amounts to saying to every one of those groups "you should choose to lose power relative to those who stealth-defect". It doesn't take much decision theory to predict that the result will be a covert arms race conducted in a climate of fear by the most secretive and paranoid among the power groups.

 

There seems to be an underlying assumption that the number of stealth-defecting AI labs doing GPT-4-level training runs is non-zero. This is a non-trivial claim and I'm not sure I agree. My impression is that there are few AI labs world-wide that are capable of training such models in the next 6-12 months and we more or less know what they are.

I also disagree with the framing of stealth-defection of being a relatively trivial operation which is better than cooperation, mostly because training such models takes a lot of people (just look at pages 15-17 in the GPT-4 paper!) and thus the probability of someone whistleblowing is large.

And for what it's worth, I would really have hoped that such things are discussed in a post that starts with a phrase of the form "All the smart people [...] seem to have unaccountably lost their ability to do elementary game theory".

This seems like the only realistic aspiration we can pursue. It would require pressure from players that have centralized compute hardware where any large scale train runs require that level of code and data transparency. Hardware companies could also flag large acquisitions. Ultimately sovereign nations will have to push hardest, this alongside global cooperation seems insurmountable. The true problem is that there is no virus causing enough harm to take action, rather the emergence of intelligent phenomena none of us, even their creators, fail to understand. So beyond sparking hollow debate, what can we tangibly do? Where do the dangers actually lie?