1 min read

44

Review

With the recent proposals about moratoriums and regulation, should we also start thinking about a strike by AI researchers and developers?

The reasoning I imagine as follows. AI capability is now growing really fast, and toward levels that will strongly affect the world. And AI safety lags behind. (A minute ago I used a ChatGPT jailbreak to get instructions for torturing a pregnant woman, that's the market leader performance for you.) And finally, I want to make the argument that working on AI capability while it is ahead of AI safety, is "pushing the bus".

Here's the metaphor, a bunch of people including you are pushing a bus full of children toward a precipice, and you're paid for each step. In this situation would you really say "oh I have to keep pushing, otherwise others will get all the money"? It's not like they'll profit from it! Their children will die, along with everyone else's! So there's no game theoretic angle, you can just make the decision alone, to stop pushing the frigging bus.

To clarify, working on AI isn't always bad. It could lead to a wonderful future for humanity. But when AI safety is behind, as now, then working on AI capability is pushing the bus. There's no good justification for it.

Hence the strike. Not by leadership but by AI researchers and developers themselves. I imagine a desk plaque saying "while AI safety lags behind AI capability, I refuse to work on AI capability". That's the start condition, and it also tells you when to stop (when safety catches up with current capability, which means not just stopping saying bad things, but for stronger AIs also safe and benevolent behavior more generally). And it's also the restart condition if safety starts lagging behind again.

Review

44

New Comment
16 comments, sorted by Click to highlight new comments since:
[-]gjm3331

If you're striking for better working conditions and more pay, your employer can get you back to work by improving your conditions and raising your pay. If you're striking because you're unwilling to work on AI capability stuff until AI safety work catches up -- which will surely take years even if no one works on AI capabilities at all -- then your employer can't get you back to work because AI capability work is your job.

So what you're proposing really isn't a strike. It's "AI capability workers should demand to be moved to other work, or quit their jobs".

Yeah, it's not the kind of strike whose purpose is to get concessions from employers. Though I guess the thing in Atlas Shrugged was also called a "strike" and it seems similar in spirit to this.

Surely any capabilities researcher concerned enough to be willing to do this should just switch to safety-relevant research? (Also, IMO the best AI researchers tend not to be in this for the money)

So there's no game theoretic angle, you can just make the decision alone, to stop pushing the frigging bus.

I don’t think this holds if you allow for p(doom) < 1. For a typical AI researcher with p(doom) ~ 0.1 and easy replacement, striking is plausibly an altruistic act and should be applauded as such.

Hm, pushing a bus full of kids towards a 10% chance of precipice is also pretty harsh. Though I agree we should applaud those who decline to do it.

Agreed, intended to distinguish between the weak claim “you should stop pushing the bus” and the stronger “there’s no game theoretic angle which encourages you to keep pushing”.

Quite like the idea of a strike, would like to hear the feasibility of such a thing at large firms like openai and deepmind. If successful I’d guess those at Anthropic would also need to pause development for the time, even if all there believe they’re doing things safely.

I like it. It seems like only the researchers themselves respect the dangers, not the CEO's or the government, so it will have to be them who say that enough is enough. 

In a perfect world they'd jump ship to alignment, but realistically we've all got to eat, so what would also be great is a generous billionaire willing to hire them for more alignment research. 

I think we need to move public opinion first, which hopefully is slowly starting to happen.  We need one of two things to happen:

  1. A breakthrough in AI alignment research
  2. Major shifts in policy

A strike does not currently help either of those.  

Edit:  Actually, I do agree that if you could get ALL AI researchers - a general strike - that would serve the purpose of delay, and I would be in favor.  I do not think that is realistic.  A lesser strike might also serve to drum up attention; I was initially afraid that it might drum up negative attention.

[This comment is no longer endorsed by its author]Reply

I think if it happens, it'll help shift policy because it'll be a strong argument in policy discussions. "Look, many researchers aren't just making worried noises about safety but taking this major action."

It increases the amount of time we have to make those breakthroughs

A minute ago I used a ChatGPT jailbreak to get instructions for torturing a pregnant woman

It gave you exactly what you asked it for. If you don't want it to do that, don't ask for it.

NB. I'm speaking of ChatGPT and its current ilk, not superpowerful genies that are dangerous to ask for anything.

This is true that it is not evidence of misalignment with the user but it is evidence of misalignment with ChatGPT creators.

My impression is that lesswrong often uses "alignment with X" to mean "does what X says". But it seems the ability to conditionally delegate is a key part of alignment in this. An AI is aligned with me and I tell it "do what Y says subject to such-and-such constraints and maintaining such-and-such goals". So failure of ChatGPT to be safe in OpenAI's sense is a failure of delegation. 

Overall, the tendency of ChatGPT to ignore previous input is kind of the center of it's limits/problems. 

[+][comment deleted]30