GPT-5 training is probably starting around now. It seems very unlikely that GPT-5 will cause the end of the world. But it’s hard to be sure. I would guess that GPT-5 is more likely to kill me than an asteroid, a supervolcano, a plane crash or a brain tumor. We can predict fairly well what the cross-entropy loss will be, but pretty much nothing else.

Maybe we will suddenly discover that the difference between GPT-4 and superhuman level is actually quite small. Maybe GPT-5 will be extremely good at interpretability, such that it can recursively self improve by rewriting its own weights.

Hopefully model evaluations can catch catastrophic risks before wide deployment, but again, it’s hard to be sure. GPT-5 could plausibly be devious enough to circumvent all of our black-box testing. Or it may be that it’s too late as soon as the model has been trained. These are small, but real possibilities and it’s a significant milestone of failure that we are now taking these kinds of gambles.

How do we do better for GPT-6?

Governance efforts are mostly focussed on relatively modest goals. Few people are directly aiming at the question: how do we stop GPT-6 from being created at all? It’s difficult to imagine a world where governments actually prevent Microsoft from building a $100 billion AI training data center by 2028.

In fact, OpenAI apparently fears governance so little that they just went and told the UK government that they won’t give it access to GPT-5 for pre-deployment testing [Edit - 17 May 2024: I now think this is probably false]. And the number of safety focussed researchers employed by OpenAI is dropping rapidly.

Hopefully there will be more robust technical solutions for alignment available by the time GPT-6 training begins. But few alignment researchers actually expect this, so we need a backup plan.

Plan B: Mass protests against AI

In many ways AI is an easy thing to protest against. Climate protesters are asking to completely reform the energy system, even if it decimates the economy. Israel / Palestine protesters are trying to sway foreign policies on an issue where everyone already holds deeply entrenched views. Social justice protesters want to change people’s attitudes and upend the social system.

AI protesters are just asking to ban a technology that doesn’t exist yet. About 0% of the population deeply cares that future AI systems are built. Most people support pausing AI development. It doesn’t feel like we’re asking normal people to sacrifice anything. They may in fact be paying a large opportunity cost on the potential benefits of AI, but that’s not something many people will get worked up about. Policy-makers, CEOs and other key decision makers that governance solutions have to persuade are some of the only groups that are highly motivated to let AI development continue.

No innovation required

Protests are the most unoriginal way to prevent an AI catastrophe - we don’t have to do anything new. Previous successful protesters have made detailed instructions for how to build a protest movement.

This is the biggest advantage of protests compared to other solutions - it requires no new ideas (unlike technical alignment) and no one's permission (unlike governance solutions). A sufficiently large number of people taking to the streets forces politicians to act. A sufficiently large and well organized special interest group can control an issue:

I walked into my office while this was going on and found a sugar lobbyist hanging around, trying to stay close to the action. I felt like being a smart-ass so I made some wise-crack about the sugar industry raping the taxpayers. Without another word, I walked into my private office and shut the door. I had no real plan to go after the sugar people. I was just screwing with the guy.

My phone did not stop ringing for the next five weeks….I had no idea how many people in my district were connected to the sugar industry. People were calling all day, telling me they made pumps or plugs or boxes or some other such part used in sugar production and I was threatening their job. Mayors called to tell me about employers their towns depended on who would be hurt by a sugar downturn. It was the most organized effort I had ever seen.

And that’s why you don’t fuck with sugar.

The discomfort of doing something weird

If we are correct about the risk of AI, history will look kindly upon us (assuming we survive). Already people basically know about AI x-risk and understand that it is not a ridiculous conspiracy theory. But for now protesting about AI is kind of odd. This doesn’t have to be a bad thing - PauseAI protests are a great way to meet interesting, unusual people. Talking about PauseAI is a conversation starter because it’s such a surprising thing to do.

When AI starts to have a large impact on the economy, it will naturally move up the priority list of the general population. But people react too late to exponentials. If AI continues to improve at the current rate, the popular reaction may come too late to avoid the danger. PauseAI’s aim is to bring that reaction forward.

Some AI researchers think that they should not go to protests because it is not their comparative advantage. But this is wrong, the key skill required is the ability to do something weird - to take ideas seriously and to actually try to fix important problems. The protests are currently so small that the marginal impact of an extra person showing up for a couple of hours once every few months is very large.

Preparing for the moment

I think a lot about this post from just after ChatGPT came out, asking why the alignment community wasn’t more prepared to seize the moment when everyone suddenly noticed that AI was getting good. I think this is a good question and one of the reasons is that most alignment researchers did not see it coming.

There will be another moment like that, when people realize that AI is coming for their job imminently and that AI is an important issue affecting their lives. We need to be prepared for that opportunity and the small movement that PauseAI builds now will be the foundation which bootstraps this larger movement in the future.

To judge the value of AI protests by the current, small protests would be to judge the impact of AI by the current language models (a mistake which most of the world appears to be making). We need to build the mass movement. We need to become the Sugar Lobby.

PauseAI’s next protest is on Monday 13 May, in 8 cities around the world.

New Comment
16 comments, sorted by Click to highlight new comments since:

Putting my EA Forum comment here:

I'd like to make clear to anyone reading that you can support the PauseAI movement right now, only because you think it is useful right now. And then in the future, when conditions change, you can choose to stop supporting the PauseAI movement. 

AI is changing extremely fast (e.g. technical work was probably our best bet a year ago, I'm less sure now). Supporting a particular tactic/intervention does not commit you to an ideology or team forever!

While I want people to support PauseAI

the small movement that PauseAI builds now will be the foundation which bootstraps this larger movement in the future

Is one of the main points of my post. If you support PauseAI today you may unleash a force which you cannot control tomorrow.

I think it is unrealistic to ask people to internalise that level of ambiguity. This is how EA's turn themselves into mental pretzels.

Rumours are GPT-5 has been finished awhile. 

My birds are singing the same tune.

Hi Tomás! is there a prediction market for this that you know of?

[This comment is no longer endorsed by its author]Reply

Maybe GPT-5 will be extremely good at interpretability, such that it can recursively self improve by rewriting its own weights.

I am by no means an expert on machine learning, but this sentence reads weird to me. 

I mean, it seems possible that a part of a NN develops some self-reinforcing feature which uses the gradient descent (or whatever is used in training) to go into a particular direction and take over the NN, like a human adrift on a raft in the ocean might decide to build a sail to make the raft go into a particular direction. 

Or is that sentence meant to indicate that an instance running after training might figure out how to hack the computer running it so it can actually change it's own weights?

Personally, I think that if GPT-5 is the point of no return, it is more likely that it is because it would be smart enough to actually help advance AI after it is trained. While improving semiconductors seems hard and would require a lot of work in the real world done with human cooperation, finding better NN architectures and training algorithms seems like something well in the realm of the possible, if not exactly plausible.

So if I had to guess how GPT-5 might doom humanity, I would say that in a few million instance-hours it figures out how to train LLMs of its own power for 1/100th of the cost, and this information becomes public. 

The budgets of institutions which might train NN probably follows some power law, so if training cutting edge LLMs becomes a hundred times cheaper, the number of institutions which could build cutting edge LLMs becomes many orders of magnitude higher -- unless the big players go full steam ahead towards a paperclip maximizer, of course. This likely mean that voluntary coordination (if that was ever on the table) becomes impossible. And setting up a worldwide authoritarian system to impose limits would also be both distasteful and difficult. 

Or is that sentence meant to indicate that an instance running after training might figure out how to hack the computer running it so it can actually change it's own weights?

I was thinking of a scenario where OpenAI deliberately gives it access to its own weights to see if it can self improve.

I agree that it would be more likely to just speed up normal ML research.

I think it might be interesting to find better slogans.

As you mentioned, the average person doesn't care about AGI. Those that do believe the world might end this century usually think it's too late already.

AI is a threat to a lot of closer term status quos (statuses quo?).

Do you want an AI to read your resume ? Do you want an AI to answer your 911 calls ? Do you want an AI to drive your taxi off a bridge? Do you want your lawyer to parrot chatGPT ? Do you want AI art in museums ?

No to self-crashing cars !

Real humans, real jobs !

AI kills the planet !

ChatGPT = 500 million cars !

It just writes itself. "Just don't build AGI" has to be explained for like 10 minutes, and it can't be sung.

GPT-5 training is probably starting around now

Sam Altman confirmed (paywalled, sorry) in November that GPT-5 was already under development. (Interestingly, the confirmation was almost exactly six months after Altman told a senate hearing (under oath) that "We are not currently training what will be GPT-5; we don't have plans to do it in the next 6 months.")

"Under development" and "currently training" I interpret as having significantly different meanings.

It probably began training in January and finished around early April. And they're now doing evals.

Thank you for working on this Joseph!

I absolutely sympathize, and I agree that with the world view / information you have that advocating for a pause makes sense. I would get behind 'regulate AI' or 'regulate AGI', certainly. I think though that pausing is an incorrect strategy which would do more harm than good, so despite being aligned with you in being concerned about AGI dangers, I don't endorse that strategy.

Some part of me thinks this oughtn't matter, since there's approximately ~0% chance of the movement achieving that literal goal. The point is to build an anti-AGI movement, and to get people thinking about what it would be like to be able to have the government able to issue an order to pause AGI R&D, or turn off datacenters, or whatever. I think that's a good aim, and your protests probably (slightly) help that aim.

I'm still hung up on the literal 'Pause AI' concept being a problem though. Here's where I'm coming from: 

1. I've been analyzing the risks of current day AI. I believe (but will not offer evidence for here) current day AI is already capable of providing small-but-meaningful uplift to bad actors intending to use it for harm (e.g. weapon development). I think that having stronger AI in the hands of government agencies designed to protect humanity from these harms is one of our best chances at preventing such harms. 

2. I see the 'Pause AI' movement as being targeted mostly at large companies, since I don't see any plausible way for a government or a protest movement to enforce what private individuals do with their home computers. Perhaps you think this is fine because you think that most of the future dangers posed by AI derive from actions taken by large companies or organizations with large amounts of compute. This is emphatically not my view. I think that actually more danger comes from the many independent researchers and hobbyists who are exploring the problem space. I believe there are huge algorithmic power gains which can, and eventually will, be found. I furthermore believe that beyond a certain threshold, AI will be powerful enough to rapidly self-improve far beyond human capability. In other words, I think every AI researcher in the world with a computer is like a child playing with matches in a drought-stricken forest. Any little flame, no matter how small, could set it all ablaze and kill everyone. Are the big labs playing with bonfires dangerous? Certainly. But they are also visible, and can be regulated and made to be reasonably safe by the government. And the results of their work are the only feasible protection we have against the possibility of FOOM-ing rogue AGI launched by small independent researchers. Thus, pausing the big labs would, in my view, place us in greater danger rather than less danger. I think we are already well within the window of risk from independent-researcher-project-initiated-FOOM. Thus, the faster we get the big labs to develop and deploy worldwide AI-watchdogs, the sooner we will be out of danger.

I know these views are not the majority views held by any group (that I know of). These are my personal inside views from extensive research. If you are curious about why I hold these views, or more details about what I believe, feel free to ask. I'll answer if I can.

I think I have two disagreements with your assessment. 

First, the probability of a random independent AI researcher or hobbyist discovering a neat hack to make AI training cheaper and taking over. GPT4 took 100M$ to train and is not enough to go FOOM. To train the same thing within the budget of the median hobbyist would require algorithmic advantages of three or four orders of magnitude. 

Historically, significant progress has been made by hobbyists and early pioneers, but mostly in areas which were not under intense scrutiny by established academia. Often,  the main achievement of a pioneer is discovering a new field, them picking all the low-hanging fruits is more of a bonus. If you had paid a thousand mathematicians to think about signal transmission on a telegraph wire or semaphore tower, they probably would have discovered Shannon entropy. Shannon's genius was to some degree looking into things nobody else was looking into which later blew up into a big field. 

It is common knowledge that machine learning is a booming field. Experts from every field of mathematics have probably thought if there is a way to apply their insights to ML. While there are certainly still discoveries to be made, all the low-hanging fruits have been picked. If a hobbyist manages to build the first ASI, that would likely be because they discover a completely new paradigm -- perhaps beyond NNs. The risk that a hobbyist discovers a concept which lets them use their gaming GPU to train an AGI does not seem that much higher than in 2018 -- either would be completely out of the left field. 

My second disagreement is the probability of an ASI being roughly aligned with human values, or to be more precise, the difference of that probability conditional on who discovers it. The median independent AI enthusiast is not a total asshole [citation needed], so if alignment is easy and they discover ASI, chances are that they will be satisfied with becoming the eternal god emperor of our light cone and not bother to tell their ASI to turn any any huge number of humans to fine red mist. This outcome will not be so different than if Facebook develops an aligned ASI first. If alignment is hard -- which we have some reason to believe it is -- then the hobbyist who builds ASI by accident will doom the world, but I am also rather cynical about the odds of big tech having much better odds. 

Going full steam ahead is useful if (a) the odds of a hobbyist building ASI if big tech stops capability research are significant and (b) alignment is very likely for big tech and unlikely for the hobbyist. I do not think either one is true. 

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year. Will this post make the top fifty?