Stop pushing the bus

cousin_it

44 Stop pushing the bus

31st Mar 2023

1 min read

44

Review

With the recent proposals about moratoriums and regulation, should we also start thinking about a strike by AI researchers and developers?

The reasoning I imagine as follows. AI capability is now growing really fast, and toward levels that will strongly affect the world. And AI safety lags behind. (A minute ago I used a ChatGPT jailbreak to get instructions for torturing a pregnant woman, that's the market leader performance for you.) And finally, I want to make the argument that working on AI capability while it is ahead of AI safety, is "pushing the bus".

Here's the metaphor, a bunch of people including you are pushing a bus full of children toward a precipice, and you're paid for each step. In this situation would you really say "oh I have to keep pushing, otherwise others will get all the money"? It's not like they'll profit from it! Their children will die, along with everyone else's! So there's no game theoretic angle, you can just make the decision alone, to stop pushing the frigging bus.

To clarify, working on AI isn't always bad. It could lead to a wonderful future for humanity. But when AI safety is behind, as now, then working on AI capability is pushing the bus. There's no good justification for it.

Hence the strike. Not by leadership but by AI researchers and developers themselves. I imagine a desk plaque saying "while AI safety lags behind AI capability, I refuse to work on AI capability". That's the start condition, and it also tells you when to stop (when safety catches up with current capability, which means not just stopping saying bad things, but for stronger AIs also safe and benevolent behavior more generally). And it's also the restart condition if safety starts lagging behind again.

Review

Frontpage

44

New Comment

16 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:56 AM

[-]gjm2y3331

If you're striking for better working conditions and more pay, your employer can get you back to work by improving your conditions and raising your pay. If you're striking because you're unwilling to work on AI capability stuff until AI safety work catches up -- which will surely take years even if no one works on AI capabilities at all -- then your employer can't get you back to work because AI capability work is your job.

So what you're proposing really isn't a strike. It's "AI capability workers should demand to be moved to other work, or quit their jobs".

[-]cousin_it2y40

Yeah, it's not the kind of strike whose purpose is to get concessions from employers. Though I guess the thing in Atlas Shrugged was also called a "strike" and it seems similar in spirit to this.

[-]Neel Nanda2y158

Surely any capabilities researcher concerned enough to be willing to do this should just switch to safety-relevant research? (Also, IMO the best AI researchers tend not to be in this for the money)

[-]Bucky2y90

So there's no game theoretic angle, you can just make the decision alone, to stop pushing the frigging bus.

I don’t think this holds if you allow for p(doom) < 1. For a typical AI researcher with p(doom) ~ 0.1 and easy replacement, striking is plausibly an altruistic act and should be applauded as such.

[-]cousin_it2y82

Hm, pushing a bus full of kids towards a 10% chance of precipice is also pretty harsh. Though I agree we should applaud those who decline to do it.

[-]Bucky2y41

Agreed, intended to distinguish between the weak claim “you should stop pushing the bus” and the stronger “there’s no game theoretic angle which encourages you to keep pushing”.

[-]Garrett Baker2y98

Quite like the idea of a strike, would like to hear the feasibility of such a thing at large firms like openai and deepmind. If successful I’d guess those at Anthropic would also need to pause development for the time, even if all there believe they’re doing things safely.

[-]TinkerBird2y43

I like it. It seems like only the researchers themselves respect the dangers, not the CEO's or the government, so it will have to be them who say that enough is enough.

In a perfect world they'd jump ship to alignment, but realistically we've all got to eat, so what would also be great is a generous billionaire willing to hire them for more alignment research.

[-]laserfiche2y2-5

I think we need to move public opinion first, which hopefully is slowly starting to happen. We need one of two things to happen:

A breakthrough in AI alignment research
Major shifts in policy

A strike does not currently help either of those.

Edit: Actually, I do agree that if you could get ALL AI researchers - a general strike - that would serve the purpose of delay, and I would be in favor. I do not think that is realistic. A lesser strike might also serve to drum up attention; I was initially afraid that it might drum up negative attention.

[This comment is no longer endorsed by its author]Reply

[-]cousin_it2y63

I think if it happens, it'll help shift policy because it'll be a strong argument in policy discussions. "Look, many researchers aren't just making worried noises about safety but taking this major action."

[-]Stephen Fowler2y42

It increases the amount of time we have to make those breakthroughs

[-]Richard_Kennaway2y2-4

A minute ago I used a ChatGPT jailbreak to get instructions for torturing a pregnant woman

It gave you exactly what you asked it for. If you don't want it to do that, don't ask for it.

NB. I'm speaking of ChatGPT and its current ilk, not superpowerful genies that are dangerous to ask for anything.

[-]Qumeric2y137

This is true that it is not evidence of misalignment with the user but it is evidence of misalignment with ChatGPT creators.

[-]JoeTheUser2y10

My impression is that lesswrong often uses "alignment with X" to mean "does what X says". But it seems the ability to conditionally delegate is a key part of alignment in this. An AI is aligned with me and I tell it "do what Y says subject to such-and-such constraints and maintaining such-and-such goals". So failure of ChatGPT to be safe in OpenAI's sense is a failure of delegation.

Overall, the tendency of ChatGPT to ignore previous input is kind of the center of it's limits/problems.