I've been repeatedly loud and explicit about this but an happy to state again that racing to build superintelligence before we know how to make it not kill everyone (or cause other catastrophic outcomes) seems really bad and I wish we could coordinate to not do that.
I disagree with a number of statements made in the post and do not support an AI development ban or pause. But I support Leo speaking his mind about this and I think it’s important for OpenAI and other labs to have a culture where employees feel free to speak about such issues.
I wonder if there's a palatable middle ground where instead of banning all AI research, we might get people to agree to ban in advance only dangerous types of ASI.
My current personal beliefs:
- ASI existential risk is very much worth worrying about
- Dangerous ASI is likely the #1 threat to humanity
- In the next few decades, the odds of ASI killing/disempowering us is tiny
- I feel good accelerating capabilities at OpenAI to build technology that helps more people
- I would not support a ban or pause on AI/AGI (because it deprives people of AI benefits, breaks promises, and also accumulates a compute overhang for whenever the ban is later lifted)
- I would happily support a preemptive ban on dangerous ASI
In the next few decadues, the odds of ASI killing/disempowering us is tiny
I found this point surprising. Is this because of long timelines to ASI?
Regardless, while it seems very hard to implement well, I'm happy to publicly say that I am in favour of a well-implemented preemptive ban on dangerous ASI
Yes, mostly.
I expect existentially dangerous ASI to take longer than ASI, which will take longer than AGI, which will take longer than powerful AI. Killing everyone on Earth is very hard to do, few are motivated to do it, and many will be motivated to prevent it as ASI’s properties become apparent. So I think the odds are low. And I’ll emphasize that these are my odds including humanity’s responses, not odds of a counterfactual world where we sleepwalk into oblivion without any response.
Is this a question for me? I am assuming "why not" refers to why I do not support a pause or a ban and not why I support that OpenAI employees should be feel free to speak up in support of such policies if that is what they believe.
This is a bit too complex to go into in a comment. I hope at some point to write a longer text (specifically I have a plan on doing a book review of "if anyone builds it then everyone dies", maybe together with "The AI con" and "AI snake oil") and to go there more into why I don't think the proposed policies are good. Just a matter of getting the time...
Have you reached out to people at Deepseek or Alibaba to make the case for AI extinction risk or propose an alliance between the labs?
I already talk about my views pretty freely in public, but I do think people at labs tend to be a bit cagier and less blunt about these things than is ideal, so in the interests of normalizing such behavior: I think Earth's approach to AI development is incredibly reckless, a competent and sane civilization would not permit the trajectory we're on, and a global moratorium on all aspects of AI capability progress for the next few decades would be a substantial improvement over the status quo (though it's probably not my optimal policy given the background rates of non-AI x-risk and value lock-in, and I'd likely favor a controlled 5-10x slowdown with pause optionality).
I think my work on capabilities at Anthropic is probably good on net because I think it's valuable for them in particular to have greater influence, but I consider the acceleratory effects of such work to be a cost. If I thought large fractions of people at other labs felt similarly and I wasn't playing against defect-bot, I would probably quit for decision theory reasons, but in practice I don't think meaningful coalitions elsewhere are running anything like my decision process.
FWIW I pretty regularly express such views internally and do not find doing so to be particularly socially costly, though I wouldn't expect this to be true of the social incentives at other labs.
a global moratorium on all aspects of AI capability progress for the next few decades would be a substantial improvement over the status quo
Saw some shrug reacts on this so wanted to elaborate a bit - I'm not super confident about this (maybe like 70% now rising to 80% the later we implement the pause), and become a lot more pessimistic about it if the moratorium does not cover things like hardware improvements, research into better algorithms, etc. I'm also sort of pricing in that there's sufficient political will to make this happen; the backlash from a decree like this if in fact most ordinary voters really hated it seems likely to be bad in various ways. As such I don't really try and do advocacy for such changes in 2025, though I'm very into preparing for such a push later on if we get warning shots or much more public will to put on the brakes. Happy to hear more on people's cruxes for those who think this is of unclear or negative sign.
If I understand you correctly, I feel like this is all fundamentally premised on the best plan being to push for an AGI/ASI ban, and further that a major blocker is the lack of common knowledge that (a subset of) the AGI safety people at frontier labs think the whole thing should stop.
I disagree with this premise - even if an ASI/AGI ban seems nice in theory (eg, I agree with Leo's much softer and hypothetical statement below), I don't think it's remotely politically tractable, and unlikely to be implemented well even if some key stakeholders are on board, but with only realistic levels of competence. If I were to publicly advocate for specific policy outcomes, I think there would be more impactful ones, by virtue of being more likely to happen. I also do not think alignment is so fundamentally hard that we should consider all plans within the current paradigm to be doomed (I'm not sure if you believe this, but it's a common justification for the ban focused position)
As such, I disagree with the various actions you recommend lab employees to take, and do not intend to take them myself. Even if I did agree with the statements, I think that saying this kind of thing publicly is likely to either not matter (because it's not noticed) or be costly to my ability to have an impact inside the lab (because it is noticed, including by people in the lab, who are annoyed at it potentially causing issues for the lab). I also think that it is doable to have an impact within a lab, for reasons I recently described on the 80K podcast, though I'm sure depends heavily on the person, role, lab, theory of impact, etc - labs are a lot less homogenous than this post implies!
If you want to take this lack of costly signalling as reason to be suspicious of me, be my guest. But I think it would be net negative for the world for me to take, according to my understanding of things. And before concluding corruption, remember that disagreement is also a plausible hypothesis.
Zooming out, I kinda feel like people are likely to either think that, all things considered, an AGI/ASI ban is the most important thing to push for, in which case I don't see great reasons for working in a lab; or think that more incremental approaches are more promising, in which case I think that taking actions costly to their influence in the lab may be a pretty bad idea (depending heavily on their situation).
It seems to me that most people who pay attention to AI (and especially policymakers) are confused about whether the race to superintelligence is real, and whether the dangers are real. I think "people at the labs never say the world would be better without the race (eg because they think the world won't actually stop)" is one factor contributing to that confusion. I think the argument "I can have more of an impact by hiding my real views so that I can have more influence inside the labs that are gambling with everyone's lives; can people outside the labs speak up instead?" is not necessarily wrong, but it seems really sketchy to me. I think it contributes to a self-fulfilling prophecy where the world never responds appropriately because the places where world leaders looked for signals never managed to signal the danger.
From my perspective, it's not about "costly signaling", it's about sending the signal at all. I suspect you're underestimating how much the world would want to change course if it understood the situation, and underestimating how much you could participate in shifting to an equilibrium where the labs are reliably sending a saner signal (and underestimating how much credibility this would build in worlds that eventually cotton on).
And even if the tradeoffs come out that way for you, I'm very skeptical that they come out that way for everyone. I think a world where everyone at the labs pretends (to policymakers) that what they're doing is business-as-usual and fine is a pretty messed-up world.
I'm pretty skeptical of this, can you say more? Specifically, I don't feel like a bunch more safety employees at labs saying this kind of thing would really make a difference, in a world where:
My sense of the situation is that a bunch of people are saying this is madness, a bunch of people are saying it isn't, and that's not going to change, at best a small handful of people get added to the madness side. Even if 50% of all lab safety employees started saying stronger things than the company, re "you should shut it all down" etc, many of them genuinely disagree and would not say that, meaning even the specific "lab safety employee constituency" isn't going to get beyond being a mess of disagreement.
I think there's a huge difference between labs saying "there's lots of risk" and labs saying "no seriously, please shut everyone down including me, I'm only doing this because others are allowed to and would rather we all stopped". The latter is consistent with the view; its absence is conspicuous. Here is an example of someone noticing in the wild; I have also heard that sort of response from multiple elected officials. If Dario could say it that'd be better, but lots of researchers in the labs saying it would be a start. And might even make it more possible for lab leaders to come out and say it themselves!
I agree there's a big difference, my skepticism is that a handful of lab safety researchers saying this would matter, when people like Hinton say it, and lab CEOs do not (Like, I would be pretty shocked if you could get this above 50 lab employees, out of thousands total). I would be curious to hear more about the chats with elected officials, if they've led you to think differently?
Quick take: I agree it might be hard to get above 50 today. I think that even 12 respected people inside one lab today would have an effect on the Overton window inside labs, which I think would have an effect over time (aided primarily by the fact that the arguments are fairly clearly on the side of a global stop being better; it's harder to keep true things out it the Overton window). I expect it's easier to shift culture inside labs first, rather than inside policy shops, bc labs at least don't have the dismissals of "they clearly don't actually believe that" and "if they did believe it they'd act differently" ready to go. There are ofc many other factors that make it hard for a lab culture to fully adopt the "nobody should be doing this, not even us" stance, but it seems plausible that that could at least be brought into the Overton window of the labs, and that that'd be a big improvement (towards, eg, lab heads becoming able to say it).
Ah, if your main objective is to shift internal lab culture I'm pretty on board with this aim, but would recommend different methods. To me, speaking prominently and publicly could eg pose significant PR risk to a lab and get resistance, while speaking loudly in internal channels is unlikely to and may be more effective. For example, I'd be more optimistic about writing some kind of internal memo making the case and trying to share it widely/create buzz, sharing the most legit examples of current AI being scary in popular internal channels, etc. I still expect this to be extremely hard, to be risky for the cause if done badly, and to become easier the scarier AI gets, so it doesn't feel like one of my top priorities right now, but I'm much more sympathetic to the ask, and do think this is something internal lab safety teams should be actively thinking about - I definitely agree with "arguing for true things is easier", though I do not personally think "the pragmatically best solution is a global ban" is objectively true (I appreciate you writing a book trying to make this case though!)
Oh yeah, I agree that (earnest and courageous) attempts to shift the internal culture are probably even better than saying your views publicly (if you're a low-profile researcher).
I still think there's an additional boost from consistently reminding people of your "this is crazy and earth should do something else" views whenever you are (e.g.) on a podcast or otherwise talking about your alignment hopes. Otherwise I think you give off a false impression that the scientists have things under control and think that the race is okay. (I think most listeners to most alignment podcasts or w/e hear lots of cheerful optimism and none of the horror that is rightly associated with >5% destruction of the whole human endeavor, and that this contributes to the culture being stuck in a bad state across many orgs.)
FWIW, it's not a crux for me whether a stop is especially feasible or the best hope to be pursuing. On my model, the world is much more likely to respond in marginally saner ways the more that decision-makers understand the problem. Saying "I think a stop would be better than what we're currently doing and beg the world to shut down everyone including us" if you believe it helps communicate your beliefs (and thus the truth, insofar as you're good at believing) even if the exact policy proposal doesn't happen. I think the equilibrium where lots and lots of people understand the gravity of the situation is probably better than the current equilibrium in lots of hard-to-articulate and hard-to-predict ways, even if the better equilibrium would not be able to pull off a full stop.
(For an intuition pump: perhaps such a world could pull off "every nation sabotages every other nation's ASI projects for fear of their own lives", as an illustration of how more understanding could help even w/out a treaty.)
Yeah, I agree that media stuff (podcasts, newspapers etc) are more of an actual issue (though only involve a small fraction of lab safety people)
I'm sure this varies a lot between contexts, but I'd guess that at large companies, employees being allowed to do podcasts or talk to journalists on the record is contingent (among other things) on them being trusted to be careful to not say things that could lead to journalists writing hit pieces with titles like "safety researcher at company A said B!!!" (it's ok if they believe some spicy things, so long as they are careful to not express them in that role). This is my model in general, not just for AI safety
There's various framings you can do like using a bunch of jargon to say something spicy so it's hard to turn into a hit piece (eg "self exfiltration" over "escape the data center"), but there's ultimately still a bunch of constraints. The Overton window has shifted a lot, so at least for us, we can say a fair amount about the risks and dangers being real, but it's only shifted so much.
Imo this is actually pretty hard and costly to defect against, and I think the correct move is to cooperate - it's a repeated game, so if you cause a mess you'll stop being allowed to do media things. (And doing media things without permission is a much bigger deal than eg publicly tweeting something spicy that goes viral). And for things like podcasts, it's hard to cause a mess even once, as company comms departments often require edit rights to the podcast. And that podcast often wants to keep being able to interview other employees of that lab, so they also don't want to annoy the company too much.
Personally, when I'm doing a media thing that isn't purely technical, I try to be fairly careful with the spicier parts, only say true things, and just avoid topics where I can't say anything worthwhile, but trying to say interesting true things within these constraints where possible.
In general, I think that people should always assume that someone speaking to a large public audience (to a journalist, on a podcast, etc), especially someone who represents a large company, will not be fully speaking their mind, and interpret their words accordingly - in most industries I would consider this general professional responsibility. But I do feel kinda sad that if someone thinks I am fully speaking my mind and watches eg my recent 80K podcast, they may make some incorrect inferences. So overall I agree with you that this is a real cost, I just think it's worthwhile to pay it and hard to avoid without just never touching on such topics in media appearances
I am personally squeamish about AI alignment researchers staying in their positions in the case where they're only allowed to both go on podcasts & keep their jobs if they never say "this is an insane situation and I wish Earth would stop instead (even as I expect it won't and try to make things better)" if that's what they believe. That starts to feel to me like misleading the Earth in support of the mad scientists who are gambling with all our lives. If that's the price of staying at one of the labs, I start to feel like exiting and giving that as the public reason is a much better option.
In part this is because I think it'd make all sorts of news stories in a way that would shift the Overton window and make it more possible for other researchers later to speak their mind (and shift the internal culture and thus shift the policymaker understanding, etc.), as evidenced by e.g. the case of Daniel Kokotajlo. And in part because I think you'd be able to do similarly good or better work outside of a lab like that. (At a minimum, my guess is you'd be able to continue work at Anthropic, e.g. b/c Evan can apparently say it and continue working there.)
Hmm. Fair enough if you feel that way, but it doesn't feel like that big a deal to me. I guess I'm trying to evaluate "is this a reasonable way for a company to act", not "is the net effect of this to mislead the Earth", which may be causing some inferential distance? And this is just my model of the normal way a large, somewhat risk averse company would behave, and is not notable evidence of the company making unsafe decisions.
I think that if you're very worried about AI x-risk you should only join an AGI lab if, all things considered, you think it will reduce x-risk. And discovering that the company does a normal company thing shouldn't change that. By my lights, me working at GDM is good for the world, both via directly doing research, and influencing the org to be safer in various targeted ways, and media stuff is a small fraction of my impact. And the company's attitude to PR stuff is consistent with my beliefs about why it can be influenced.
And to be clear, the specific thing that I could imagine being a firable offence would be repeatedly going on prominent podcasts, against instructions, to express inflammatory opinions, in a way that creates bad PR for your employer. And even then I'm not confident, firing people can be a pain (especially in Europe). I think this is pretty reasonable for companies to object to, the employee would basically be running an advocacy campaign on the side. If it's a weaker version of that, I'm much more uncertain - if it wasn't against explicit instructions or it was a one off you might get off with a warning, if it is on an obscure podcast/blog/tweet there's a good chance no one even noticed, etc.
I'm also skeptical of this creating the same kind of splash as Daniel or Leopold because I feel like this is a much more reasonable company decision than those.
The thing I'm imagining is more like mentioning, almost as an aside, in a friendly tone, that ofc you think the whole situation is ridiculous and that stopping would be better (before & after having whatever other convo you were gonna have about technical alignment ideas or w/e). In a sort of "carthago delanda est" fashion.
I agree that a host company could reasonably get annoyed if their researchers went on many different podcasts to talk for two hours about how the whole industry is sick. But if casually reminding people "the status quo is insane and we should do something else" at the beginning/end is a fireable offense, in a world where lab heads & Turing award winners & Nobel laureate godfathers of the field are saying this is all ridiculously dangerous, then I think that's real sketchy and that contributing to a lab like that is substantially worse than the next best opportunity. (And similarly if it's an offense that gets you sidelined or disempowered inside the company, even if not exactly fired.)
Ah, that's not the fireable offence. Rather, my model is that doing that means you (probably?) stop getting permission to do media stuff. And doing media stuff after being told not to is the potentially fireable offence. Which to me is pretty different than specifically being fired because of the beliefs you expressed. The actual process would probably be more complex, eg maybe you just get advised not to do it again the first time, and you might be able to get away with more subtle or obscure things, but I feel like this only matters if people notice.
Thanks for the clarification. Yeah, from my perspective, if casually mentioning that you agree with the top scientists & lab heads & many many researchers that this whole situation is crazy causes your host company to revoke your permission to talk about your research publicly (maybe after a warning), then my take is that that's really sketchy and that contributing to a lab like that is probably substantially worse than your next best opportunity (e.g. b/c it sounds like you're engaging in alignmentwashing and b/c your next best opportunity seems like it can't be much worse in terms of direct research).
(I acknowledge that there's room to disagree about whether the second-order effect of safetywashing is outweighed by the second-order effect of having people who care about certain issues existing at the company at all. A very quick gloss of my take there: I think that if the company is preventing you from publicly acknowledging commonly-understood-among-experts key features of the situation, in a scenario where the world is desperately hurting for policymakers and lay people to understand those key features, I'm extra skeptical that you'll be able to reap the imagined benefits of being a "person on the inside".)
I acknowledge that there are analogous situations where a company would feel right to be annoyed, e.g. if someone were casually bringing up their distantly-related political stances in every podcast. I think that this situation is importantly disanalogous, because (a) many of the most eminent figures in the field are talking about the danger here; and (b) alignment research is used as a primary motivating excuse for why the incredibly risky work should be allowed to continue. There's a sense in which the complicity of alignment researchers is a key enabling factor for the race; if all alignment researchers resigned en masse citing the ridiculousness of the insanity of the race then policymakers would be much more likely to go "wait, what the heck?" In a situation like that, I think the implicit approval of alignment researchers is not something to be traded away lightly.
For what it's worth, I think that it's pretty likely that the bureaucratic processes at (e.g.) Google haven't noticed that acknowledging that the race to superintelligence is insane has a different nature than (e.g.) talking about the climate impacts of datacenters, and I wouldn't be surprised if (e.g.) Google issued one of their researchers a warning the first time they mentioned things, not out of deliberate sketchiness but just out of bureaucratic habit. My guess is that that'd be a great opportunity to push back, spell out the reason why the cases are different, and see whether the company stands up to its alleged principles or codifies its alignmentwashing practices. If you have the opportunity to spur that conversation, I think that'd be real cool of you -- I think there's a decent chance it would spark a bunch of good internal cultural change, and also a decent chance that it would make the issues with staying at the lab much clearer (both internally, and to the public if a news story came of it).
Separate point: Even if the existence of alignment research is a key part of how companies justify their existence and continued work, I don't think all of the alignment researchers quitting would be that catastrophic to this. Because what appears to be alignment research to a policy maker is a pretty malleable thing. Large fractions of current post training are fundamentally about how to get the model to do what you want when this is hard to specify. Eg how to do reasoning model training for harder to verify rewards, avoiding reward hacking, avoiding sycophancy etc. Most people working on these things aren't thinking too much about AGI safety and would not quit, but could be easily sold to policy makers at doing alignment work. (and I do personally think the work is somewhat relevant, though far from the most important thing and not sufficient, but this isn't a crux)
All researchers quitting en masse and publicly speaking out seems impactful for whistleblowing reasons, of course, but even there I'm not sure how much it would actually do, especially in the current political climate.
I still feel like you're making much stronger updates on this, than I think you should. A big part of my model here is that large companies are not coherent entities. They're bureaucracies with many different internal people/groups with different roles, who may not be that coherent. So even if you really don't like their media policy, that doesn't tell you that much about other things.
The people you deal with for questions like "can I talk to the media" are not supposed to be figuring out for themselves if some safety thing is a big enough deal for the world that letting people talk about it is good. Instead, their job is roughly to push forward some set of PR/image goals for the company, while minimising PR risk. There's more senior people who might make a judgement call like that, but those people are incredibly busy, and you need a good reason to escalate up to them.
For a theory of change like influencing the company to be better, you will be interacting with totally different groups of people, who may not be that correlated - there's people involved in the technical parts of the AGI creation pipeline who I want to use safer techniques, or let us practice AGI relevant techniques; there's senior decision makers who you want to ensure make the right call in high stakes situations, or push for one strategic choice over another; there's the people in charge of what policy positions to advocate for; there's the security people; etc. Obviously the correlation is non-zero, the opinions and actions of people like the CEO affect all of this, but there's also a lot of noise, inertia and randomness, and facts about one part of the system can't be assumed to generalise to the others. Unless senior figures are paying attention, specific parts of the system can drift pretty far from what they'd endorse, especially if the endorsed opinion is unusual or takes thought/agency to conclude (I would consider your points about safety washing etc here to be in this category). But when inside you can build a richer picture of what parts of the bureaucracy are tractable to try to influence.
I agree that large companies are likely incoherent in this way; that's what I was addressing in my follow-on comment :-). (Short version: I think getting a warning and then pressing the issue is a great way to press the company for consistency on this (important!) issue, and I think that it matters whether the company coheres around "oh yeah, you're right, that is okay" vs whether it coheres around "nope, we do alignmentwashing here".)
With regards to whether senior figures are paying attention: my guess is that if a good chunk of alignment researchers (including high-profile ones such as yourself) are legitimately worried about alignmentwashing and legitimately considering doing your work elsewhere (and insofar as you prefer telling the media if that happens -- not as a threat but because informing the public is the right thing to do) -- then, if it comes to that extremity, I think companies are pretty likely to get the senior figures involved. And I think that if you act in a reasonable, sensible, high-integrity way throughout the process, that you're pretty likely to have pretty good effects on the internal culture (either by leaving or by causing the internal policy to change in a visible way that makes it much easier for researchers to speak about this stuff).
FWIW I used to agree with you but now agree with Nate. A big part of the update was developing a model of how "PR risks" work via a kind of herd mentality, where very few people are actually acting on their object-level beliefs, and almost everyone is just tracking what everyone else is tracking.
In such a setting, "internal influence" strategies tend to do very little long-term, and maybe even reinforce the taboo against talking honestly. This is roughly what seems to have happened in DC, where the internal influence approach was swept away by a big Overton window shift after ChatGPT. Conversely, a few principled individuals can have a big influence by speaking honestly (here's a post about the game theory behind this).
In my own case, I felt a vague miasma of fear around talking publicly while at OpenAI (and to a lesser extent at DeepMind), even though in hindsight there were often no concrete things that I endorsed being afraid of—for example, there was a period where I was roughly indifferent about leaving OpenAI, but still scared of doing things that might make people mad enough to fire me.
I expect that there's a significant inferential gap between us, so this is a hard point to convey, but one way that I might have been able to bootstrap my current perspective from inside my "internal influence" frame is to try to identify possible actions X such that, if I got fired for doing X, this would be a clear example of the company leaders behaving unjustly. Then even the possible "punishment" for doing X is actually a win.
I guess speaking out publicly just seems like a weird distraction to me. Most safety people don't have a public profile! None of their capabilities colleagues are tracking the fact that they have or have not expressed specific opinions publicly. Some do, but it doesn't feel like you're exclusively targeting them. And eg If someone is in company wide slack channels leaving comments about their true views, I think that's highly visible and achieves the same benefits of talking honestly, with fewer risks.
I'm not concerned about someone being fired for this kind of thing, that would be pretty unwise on the labs' part as you risk creating a martyr. Rather, I'm concerned about eg senior figures thinking worse of safety researchers as a whole because it causes a PR headache, eg viewing them as radical troublemakers, and this making theories of impact around influencing specific senior decision makers harder (and I'm more optimistic about those, personally)
Rather, I'm concerned about eg senior figures thinking worse of safety researchers as a whole because it causes a PR headache, eg viewing them as radical troublemakers, and this making theories of impact around influencing specific senior decision makers harder (and I'm more optimistic about those, personally)
Thank you Neel for stating this explicitly. I think this is very valuable information. This matches what some of my friends told me privately also. I would appreciate it a lot if you could give a rough estimate of your confidence that this would happen (ideally some probability/percentage). Additionally, I would appreciate if you could say whether you'd expect such a consequence to be legible/visible or illegible (once it had happened). Finally, are there legible reasons you could share for your estimated credence that this would happen?
(to be clear: I am sad that you are operating under such conditions. I consider this evidence against expecting meaningful impact from the inside at your lab.)
It's not a binary event - I'm sure it's already happened somewhat. OpenAI has had what, 3 different safety exoduses by now, and (what was perceived to be) an attempted coup? I'm sure leadership at other labs have noticed. But it's a matter of degree.
I also don't think this should be particularly surprising - this is just how I expect decision makers at any organisation that cares about its image to behave, unless it's highly unusual. Even if the company decides to loudly sound the alarm, they likely want to carefully choose the messaging and go through their official channels, not have employees maybe going rogue and ruining message discipline. (There are advantages to the grassroots vibe in certain situations though). To be clear, I'm not talking about "would take significant retaliation", I'm talking about "would prefer that employees didn't, even if it won't actually stop them"
This sounds to me like there would actually be specific opportunities to express some of your true beliefs that you wouldn't worry would cost you a lot (and some other opportunities where you would worry and not do them). Would you agree with that?
(optional: my other comment is more important imo)
I'm not concerned about someone being fired for this kind of thing, that would be pretty unwise on the labs' part as you risk creating a martyr
I think you ascribe too much competence/foresight/focus/care to the labs. I'd be willing to bet that multiple (safety?) people have been fired from labs in a way that would make the lab look pretty bad. Labs make tactical mistakes sometimes. Wasn't there a thing at OpenAI for instance (lol)? Of course it is possible(/probable?) that they would not fire in a given case due to sufficient "wisdom", but we should not assign an extreme likelihood to that.
Yeah, agreed that companies sometimes do dumb things, and I think this is more likely at less bureaucratic and more top down places like OpenAI - I do think Leopold went pretty badly for them though, and they've hopefully updated. I'm partly less concerned because there's a lot of upside if the company makes a big screw up like that.
This is roughly what seems to have happened in DC, where the internal influence approach was swept away by a big Overton window shift after ChatGPT.
In what sense was the internal influence approach "swept away"?
Also, it feels pretty salient to me that the ChatGPT shift was triggered by public, accessible empirical demonstrations of capabilities being high (and social impacts of that). So in my mind that provides evidence for "groups change their mind in response to certain kinds of empirical evidence" and doesn't really provide evidence for "groups change their mind in response to a few brave people saying what they believe and changing the overton window".
If the conversation changed a lot causally downstream of the CAIS extinction letter or FLI pause letter, that would be better evidence for your position (though also consistent with a model that put less weight on preference cascades and model the impact more like "policymakers weren't aware that lots of experts were concerned, this letter communicated that experts were concerned"). I don't know to what extent this was true. (Though I liked the CAIS extinction letter a lot and certainly believe it had a good amount of impact — I just don't know how much.)
As such, I disagree with the various actions you recommend lab employees to take, and do not intend to take them myself.
It's not clear that you disagree that much? You say you agree with leo's statement, which seems to be getting lots of upvotes and "thanks" emojis suggesting that people are going "yes, this is great and what we asked for".
I'm not sure what other actions there are to disagree with. There's "advocate internally to ensure that the lab lets its employees speak out publicly, as mentioned above, without any official retaliation" — but I don't really expect any official retaliation for statements like these so I don't expect this to be a big fight where it's costly to take a position.
To me, Leo's statement is much weaker than what the post is asking people to say - it's saying "conditional on us not knowing how to make ASI without killing everyone, it would be nice if we could coordinate on not racing to do it" - as a literal statement this seems obviously reasonable (eg someone who thinks we know how to make ASI safely, or will easily figure it out in time, could also agree with this, and someone who strongly opposes any kind of governmental AGI/ASI ban could agree with it, or even someone who thinks that in reality labs should do nothing but race; though I know Leo's actual views are stronger than this)
To me this is not advocating for an AGI ban, as an actual practical political request, it's just saying "in theory, if we could coordinate, it would be nice". The post is saying things I consider much stronger like:
out publicly[2] against the current AI R&D regime and in favor of an AGI ban
prefer an AGI ban[1] over the current path
I am kinda confused by Leo's comment being so highly upvoted, if it was genuinely all the authors of this post wanted then I suggest they write a different post, since I found the current one far more combative.
Leo is saying:
I've been repeatedly loud and explicit about this but an happy to state again that racing to build superintelligence before we know how to make it not kill everyone (or cause other catastrophic outcomes) seems really bad and I wish we could coordinate to not do that.
The "before" here IMO pretty clearly (especially in the context of the post) communicates "we do not know currently how to do this, we should currently not be racing, and I support efforts to stop racing at substantial cost". Maybe I am wrong in that interpretation, if so I do think it's indeed so weak as to not really mean anything.
we do not know currently how to do this
Agreed this is implied
we should currently not be racing
I believe Leo believes this, and it's somewhat implied by the statement, though imo that statement is also consistent with eg "there's a 50% chance that by the time we make AGI we'll have figured how to align ASI, therefore it's fine to continue to that point and then we can stop if need be or continue"
I wish we could coordinate to not do that.
I support efforts to stop racing at substantial cost
To me the latter doesn't seem the important claim, "should we take significant cost" doesn't feel like a crux, rather "would this work at all" feels like my crux, and "I wish" reads to me like it's side stepping that. This was the key bit of what Leo said that I consider much softer than what the post asks for.
I'm also not sure the former implies the latter, but that gets messier - eg if you think that we will figure out how to align ASI after 6-12 months of pause at the right time, it doesn't seem very costly to push for that. While if you think it will take at least 20 years and might totally fail, maybe it does. I consider the statement to be agnostic on this.
To be clear, I know Leo's actual beliefs are on the doomy end, but I'm trying to focus on what I think the statement actually says, and what I meant when I called it obviously reasonable.
Other positions I consider consistent with the statement:
if it was genuinely all the authors of this post wanted then I suggest they write a different post
Leo's statement is quite good without being all we wanted. (indeed, of the 3 things we wanted, 1 is about how we think it makes sense for others to relate to safety researcher based on what they say/[don't say] publicly. and 1 is about trying to shift the lab's behavior toward it being legibly safe for employees to say various things, which Leo's comment is not about.) I internally track a pretty crucial difference between what I want to happen in the world (ie that we shift from plan B to plan A somehow) and how I believe people ought to relate to the public stance/[lack thereof] of safety researchers within frontier labs. I think there are maybe stronger stances Leo could have taken, and weaker ones, and I endorse having the way I relate/model/[act towards] Leo depend on which he takes. I think the public stance that would max lead to me maximally relating well to a safety researcher ought to be something like "I think coordinating to stop the race (even if in the form of some ban which I won't choose the exact details of) would be better than the current race to ever more capable AI. I would support such coordination. I am currently trying to make the situation better in case there is no such coordination, but I don't think the current situation is sufficiently promising to justify not coordinating. Also there is a real threat of humanity's extinction if we don't coordinate." (or something to that effect)
I think that saying this kind of thing publicly is likely to either not matter (because it's not noticed) or be costly to my ability to have an impact inside the lab (because it is noticed, including by people in the lab, who are annoyed at it potentially causing issues for the lab)
I appreciate you being willing to say this in worlds where you believe it.
Just want to point out that even if you think the proposal of an AI pause is too unrealistic or extreme, there's a wide range of possible public statements you could make. I think the important thing is not that all safety-minded lab employees advocate for an AI pause in particular, but that they feel comfortable honestly stating their views even if they disagree with their employer.
If a bunch of people at a frontier lab tweeted their honest opinions about AI risk and got fired shortly thereafter, I would expect that to be huge news, in a way that would outweigh the negative impact of those people not working at the lab anymore. (Huge enough that I expect they would not in fact be fired.)
I also wouldn't want people to be peer-pressured into making statements that are more extreme than their actual views, but I think we're pretty far from that world.
I also wouldn’t want people to be peer-pressured into making statements that are more extreme than their actual views, but I think we’re pretty far from that world.
That's because there isn't a norm for safety researchters to take the public stance described here. Once such a thing became a norm, peer pressure into making extreme statements, and generally threats to force them to make extreme statements, would be common.
Look at what we have now with all sorts of social justice statements.
I’ve said this many times in conversations, but I don’t think I’ve ever written it out explicitly in public, so:
I support some form of global ban or pause on AGI/ASI development. I think the current AI R&D regime is completely insane, and if it continues as it is, we will probably create an unaligned superintelligence that kills everyone.
Strong upvoted. I struggle to imagine a company which punishes its employees for speaking the truth, but where the company leadership decides to become sane and listen to them when shit hits the fan.
Huge thanks to all the lab employees who stated their support for an AI moratorium in this thread!
Can we make this louder and more public? This is really important for the public to understand.
I support a magically enforced 10+ year AGI ban. It's hard for me to concretely imagine a ban enforced by governments, because it's hard to disentangle what that counterfactual government would be like, but I support a good government enforced AGI slowdown. I do like it when people shout doom from the rooftops though, because it's better for my beliefs to be closer to global average average, and the global discourse is extremely far from overshooting doominess.
I agree that people should clearly state that they think there's a catastrophic risk, but I disagree that people should clearly state that they think we should pause.
If we premise (as this post does) on the fact that the person we are talking about actually believes that an international ban would be a great improvement over the current mad AI race, then the above quote seems wrong to me.
I agree that experts should not pretend like they have more authority than they do in judging whether we should pause. But they could still say 1) that the race is insane, 2) that an international ban seems like a great improvement, 3) that if such a ban was proposed, they would not oppose it and 4) they would in fact support it. If not the experts, then who? To be clear, I don't think the experts within the lab racing to build the tech are necessary here (this is not what the post is about). There are experts outside of the lab also (and they don't have the [huge conflicts of interest]/pressure to filter(/falsify?) their speech). But if not the experts, then who would be better placed to say the above? if there is no one to say it, how does it get understood? if it doesn't get understood, coordination to actually move out of the status quo towards some kind of international agreement is much harder. The CEOs of some of the lab could say it and that would definitely have an impact, but will they (lol)? Politicians could say it, but probably the backing of many experts would make this much easier for the politicians to say.
I think "there are catastrophic risks" is way too weak and doesn't substitute. Partly because "there are catastrophic risk, so please give more money to me/so put me in charge/so we must beat those less careful folks" are also possible readings. I also happen to have it on very good authority that some politicians, when informed that many experts recognize the risks of extinctions and told the reasons why we should stop the mad AI race, will ask "but do the experts support stopping?" with perhaps a side of ("or do they just want more money for their thing?")
I agree, generally - I think they'll be tricked into doing capabilities anyways though. Imo, better to reject the offer and say why.
You're only saying that because there is an extraordinarily consistent history of this exact thing happening over and over again! That empirical evidence bears no weight in comparison to my personal mental models. You clearly don't understand how virtuous and clever I am. Real AI lab safety work and inside political maneuvering has never been tried!
I think I technically count as one of those? It's not my day job, but I contributed a task to METR's Long Tasks paper, and I've made minor contributions to a handful of other AI-Safety-ish papers.
Anyway, if it counts: I support a ban as well. (I don't have a very high p(doom), but I don't think it needs to be very high to be Too High.)
Moreover, this strategy does not involve any costly signals that would make the statement of intent credible. How can we know (at the point where we choose whether to enforce the norm), absent additional information, that making the lab's outcome marginally better by being on the inside is their true motivation, where a similarly credible explanation would be that their actual motive (whether they are consciously aware of it or not) is something like a fun job with a good salary (monetary or paid in status), that can be justified by paying lip service to the threat models endorsed by those whose trust and validation they want (all of which are fine in themselves/isolation, but not justifying contributing to summoning a demon). To go even further with that, it allows people to remain strategically ambiguous, so as to make it possible for people of different views/affiliations to interpret the person as "one of my people".
I would say that I aim not to give in to threats (or “norm enforcement”) but you don’t even have a threat! I think you may want to rethink your models of how norm enforcement works.
Seeing the post as a threat misses the intended point. It is important to state explicitly: The goal of the three norms argued for in the post was never to force people to publicly support something they don't in fact believe in. It was also never to force people to be more honest about what they believe. The post explicitly says what we think you should be doing, so that there can be a discussion about it. But the norm enforcement part is about what we think others (who are not necessarily working at frontier labs) should be doing.
Separately, I am not sure I understood what you meant by "I aim to not give in to 'norm enforcement'", but it seems to me that there is a culture inside the labs that make many people working there uncomfortable taking a public stance. To be more explicit, does that also activate your will to not give in to 'norm enforcement'? (if not, why not?)
> I think you may want to rethink your models of how norm enforcement works.
I didn't get what you were trying to communicate here. Continuing to rethink (publicly) my models of how norm enforcement works is why we wrote this post on LW.
But the norm enforcement part is about what we think others (who are not necessarily working at frontier labs) should be doing.
A threat by proxy is still a threat.
I conclude from this that you really do see this post as a threat (also you admitted there is no threat in your first comment so this comment now seems contradictory/bad-faith).
some thoughts:
- this isn't a threat by proxy and isn't a threat (but if it were a tbp then it would be a t sure)
- I am in the "others" group. I implement the norm I endorse in the post, and I am not threatening you. I don't want to sound dismissive but you are not giving me a lot to work with here, and it sounds to me like either 1) you have a vague model of what a threat is that includes things that aren't threats or 2) you are misunderstanding the post and our intent such that you model us as having made a threat.
How do you think norm enforcement works, other than by threatening people who don't comply with the norm?
I probably should have said "norm execution" (ie follow the norm). This might just be a cultural gap, but I think norm enforcement/execution/implementation works in many ways that are not threats. For instance, there is pizza at a conference. there is a norm that you shouldn't take all the pizza if there is a big line behind you. some people break this norm. what happens? do they get threatened? no! they just get dirty looks and people talking behind their backs. maybe they get the reputation as the "pizza taker". In fact, nobody necessarily told them before this happened that taking all the pizza would break the norm.
I think there is a strange presumption that one is owed my and others' maximum respect and friendship. anything less than that would be a "punishment". that is pretty strange. if I have money in my pocket but I will only give some to you based on how many "good deeds" I have seen you do, this is not a threat. I guess that if you did not understand the motives or if the motives were actually to get a specific person to do more "good deeds" (by telling them in advance what the reward would be), you could call it a bribe. but calling it a threat is obviously incorrect.
I think norm enforcement/execution/implementation can and is in my case motivated by an aesthetic preference for "points" that are person A to give such as respect and friendship 1) not go to someone who does not deserve them (in my eyes) and instead 2) go to someone who does deserve them. It is not primarily driven by a consequentialist desire for more people to do respect-and-friendship-deserving things. It is primarily driven by a desire for the points to match reality, and thus enable greater cooperation and further good things down the line.
I realized based on a few comments that the three norms I discuss in the post were seen by some as like one giant strategy to produce more public stances from safety researchers. This is not the case. I am just talking to three different audiences and I explain a norm that I think makes sense (independently) for them.
I conclude from this that you really do see this post as a threat (also you admitted there is no threat in your first comment so this comment now seems contradictory/bad-faith).
Sure, I'll correct it to "an attempted threat by proxy is still an attempted threat". (It's not a threat just because you have nothing I care about to threaten me with, but it would be a threat if I did care about e.g. whether you respect me.)
But I agree that I am not trying to cooperate with you, if that's what you mean by bad faith.
[Co-written by Mateusz Bagiński and Samuel Buteau (Ishual)]
Many X-risk-concerned people who join AI capabilities labs with the intent to contribute to existential safety think that the labs are currently engaging in a race that is unacceptably likely to lead to human disempowerment and/or extinction, and would prefer an AGI ban[1] over the current path. This post makes the case that such people should speak out publicly[2] against the current AI R&D regime and in favor of an AGI ban[3]. They should explicitly communicate that a saner world would coordinate not to build existentially dangerous intelligences, at least until we know how to do it in a principled, safe way. They could choose to maintain their political capital by not calling the current AI R&D regime insane, or find a way to lean into this valid persona of “we will either cooperate (if enough others cooperate) or win the competition in style (otherwise)”.
X-risk-concerned people who have some influence within AI capabilities labs should additionally truthfully state publicly and advocate internally to ensure that the lab lets its employees speak out publicly, as mentioned above, without any official retaliation. If they are unable to make a lab follow this policy, they should state so publicly.
X-risk-concerned people in our communities should enforce the norm of praising the heroism of those who [join AI capabilities labs while speaking out publicly on the current mad race], and being deeply skeptical of the motives of those who [join without publicly speaking out].
Not being public about one's views on this hinders the development of common knowledge, nearly guarantees that the exposure to corrupting influence from working inside the lab (which doesn’t depend on whether they publicly speak out) partially reshapes one into a worse version of oneself, and gives an alibi[4] to people who want to join labs for other reasons that would otherwise be condemned by their community.
Liron: "Do you really think that we should be encouraging people to go work at [these frontier labs]?"
Rob: "Do you think anyone who understands and cares about these [risks from superintelligence] should not be in the room where they can affect what actually happens?"— Rob Miles on Doom Debates
Rob Wiblin: Should people who are worried about AI alignment and safety go work at the AI labs? There’s kind of two aspects to this. Firstly, should they do so in alignment-focused roles? And then secondly, what about just getting any general role in one of the important leading labs?
Zvi Mowshowitz: This is a place I feel very, very strongly that the 80,000 Hours guidelines are very wrong. So my advice, if you want to improve the situation on the chance that we all die for existential risk concerns, is that you absolutely can go to a lab that you have evaluated as doing legitimate safety work, that will not effectively end up as capabilities work, in a role of doing that work. That is a very reasonable thing to be doing.
— Zvi Mowshowitz on the 80,000 Hours podcast
The reasoning exemplified in the above quotes can often be heard in the circles concerned with AI X-risk (or even AI safety more broadly), including those who think that we are on a bad trajectory, tending towards an existential catastrophe, and that a saner trajectory would involve coordinating to pause the development of AI that may lead to capabilities sufficient for an existential catastrophe, at least until we figure out whatever needs to be figured out, to ensure that that kind of AI has a robustly good impact.
The motivation of ensuring that "there be good people in the room" (if truthful), in itself, is noble and virtuous. It makes a lot of sense from a perspective that is largely focused on the marginal and tractable impact, as is the stable of, among others, practical/as-applied EA philosophy.
However, this strategy carries a great risk. Once a person enters the monster's belly, the monster becomes capable of gradually constraining the person's degrees of freedom, so that, at each point, it is "locally rational" for the person to continue to work, business-as-usual, while their agency is gradually being trimmed and shaped, so as to better serve the monster. The ambitious positive impact that was initially intended erodes into "I am one of the few good guys, and if I leave, a worse guy is gonna replace me, so I should stay and do whatever I can on the margin (even if what I'm doing now is very far from what I initially intended).". This can take more corrupted/pernicious forms as well, such as the person's worldview and/or values[5] actually adapting to the new situation, so as to rationalize their prior behavior.
You have a moral obligation not to let it happen. The world's fate is at stake[6].
Moreover, this strategy does not involve any costly signals that would make the statement of intent credible. How can we know (at the point where we choose whether to enforce the norm), absent additional information, that making the lab's outcome marginally better by being on the inside is their true motivation, where a similarly credible explanation would be that their actual motive (whether they are consciously aware of it or not) is something like a fun job with a good salary (monetary or paid in status), that can be justified by paying lip service to the threat models endorsed by those whose trust and validation they want (all of which are fine in themselves/isolation, but not justifying contributing to summoning a demon). To go even further with that, it allows people to remain strategically ambiguous, so as to make it possible for people of different views/affiliations to interpret the person as "one of my people".
There is a claim that by being on the inside, by being promoted, by befriending people within the labs, you will get an opportunity to steer these labs somewhat. "You just have to play the long game and win friends and influence people, and maybe at a critical point you will be able to get this corporation to do counterfactually better." There is an implied claim that what you would be there to do is to advocate for small changes of bearable costs to be adopted voluntarily and unilaterally by a lab trying to win a race to superintelligence. There is also an implied claim that, in expectation, you will do this sufficiently that it offsets the direct negative impact of your labor on the race. More importantly, it also offsets your reinforcement of the frame that these companies are being responsible and have "good people" working there (and therefore, we need not coordinate for something better). Another additional justification is "if not me, then someone worse", or "if not this careful lab, then some careless lab", and "if not the USA, then China" (and therefore we cannot coordinate for something better).
Just like extraordinary claims require extraordinary evidence, strategies from a reference class of strategies that can have very large negative effects require very good justifications for expecting them not to succumb to one of the failure modes of strategies in this reference class.
First, how surprised would you truly be if you found out that in the default future, you either quit with regret for joining, or ended up just getting along and not pushing as hard as you thought you would for incremental change?
Second, how surprised would you truly be if your words ended up not being enough? Ended up not steering this corporation as much as you expected[7]?
Sometimes even heroes just have to play the game, and sometimes they just have to be hawkish, stare oblivion in the face, and keep doing the bad thing, potentially even with the burning intent to win, until the others come to their senses and also support plan A (which is to ban Superintelligence at least until we mostly agree that we know what we are doing, and can do progress in sane non-racing conditions).
But there is no chance of magically coordinating around plan A without common knowledge of support/wish for coordination around plan A.
The USA did not unilaterally reduce its nuclear arsenal. There was a lot of hawkishness, a lot of will to actually build more nuclear warheads if needed. But people also clearly signaled support for doing something better and more sane. A clear, sober intent to go for a coordinated solution, and actually enforce it, and make it so that the other side doesn't cheat, but nevertheless, a clear intent to not cheat oneself, and to just go for plan A, to make it an option, if at all possible.
If you're joining an org that is, in your assessment, ~net-negative because it seems like your role is actually locally good, you should run this assessment by people whose epistemics you trust, so that they can red team the hell out of it, especially given that "apparently locally good positions within an EvilCorp" are an effective lure for people in a reference class that includes ambitiously benevolent LessWrongers, Effective Altruists, etc.
Making plan A (coordination around not building X-risk-posing AI) requires a sufficient buildup of common knowledge. Building common knowledge requires speaking publicly about what is sane to do, even if — especially if — on your own, you are pursuing a plan B that superficially seems to hypocritically go against the grain of the plan A you are publicly supporting. The default Schelling point is "Rabbit", not "Stag", and this will not change unless the widespread desire to "hunt the Stag" becomes common knowledge.
To show that you actually care about reducing AI X-risk, state publicly that you would support coordination around not building dangerous ASI, that not building it is plan A, and that whatever you're doing inside the lab is either plan B (if plan A does not succeed), or building science and technology that you expect to be helpful if plan A is well-accomplished. The ~personal costs imposed by such a statement make it a credible signal of commitment[8].
An optional emotional dialogue on betrayal
(This section is about emotions. If you are cringing and running away after reading this sentence, this is not meant for you, and I’d encourage you to skip.)
I [Ishual / Samuel Buteau] have had many private discussions with friends that went essentially as below. If you recognize yourself, please know that you are far from alone, and I am pointing out a dynamic with an anonymized example, which is not about you personally.
Me: So, at least you should publicly state you’d rather we reached an international agreement so that the race could stop.
Friend: I don’t think I can do that. I don’t think you understand how much face I’d lose with my colleagues.
Me: This makes no logical sense. If you are trying to signal loyalty to EvilCorp (or LeastBadAI), calling for all players to be bound by what you ask EvilCorp to voluntarily do is strictly more loyal.
Friend: I understand that it is *logically* more loyal, but the vibes are all wrong, and my colleagues are not reasonable about this. They will just react very poorly if I say anything.
Me: It sounds to me like you don’t understand the depth of the betrayal I feel here. I think that no matter how unreasonable your colleagues are, I am reasonably very upset that you won’t even do this. It feels like defection. I don’t think your tiny incremental improvements to safety at EvilCorp will matter, and you don’t think my attempt at international cooperation will matter. But the difference is that you are shooting my hopes in the face, and I am accepting that you have to go work at EvilCorp and try your best. I am just asking you to stop shooting my hope in the face! You are willing to accommodate your colleagues so much more than (to accommodate) me. Am I really asking for so much?
Friend: I think you are asking me to maybe get fired here.
Me: Do you know how fucked up it would be if they fired you over this?
Friend: Fucked up things happen lol (you should know that!)
Me: Yes, but if the culture is so utterly irredeemable internally that you are worried about getting fired over vibes despite logically being more on their side than if you just nag about voluntary burdens they should take on, … I don’t even know what to say. Don’t you think the world has a right to know? Don’t you think outsiders would care? Don’t you think maybe EvilCorp would have to not fire you? Don’t you think the impact you’d have on putting the world on a safer path would be bigger than what you’ll have from within this dysfunctional culture?
Friend: … I don’t know, man, let's talk about it in a few months.
Me: Look, I get that the incentives are not on my side here. I get it. I just want you to know that many people on the outside would have your back if you got fired. And maybe it is all a mirage that you’d be dispelling, and you’d have many people at EvilCorp at your side also.
More generally, ban on whatever sort of AI they expect to be pursued and lead to human extinction.
How to speak out publicly: Maybe say it in the comments? Maybe write your own post about it? Maybe say it on podcasts? Maybe if someone says some high profile version of the idea, stand behind them? Probably if you don't do any of these, you are not speaking out publicly in our eyes (but you can reach out and we will maybe include your thing in the comments)
If your colleagues can't tell if you'd prefer a ban to racing, you are not speaking out publicly.
More precisely, we think you should speak out both publicly and legibly to outsiders.
They should either take a public stance that plan A (coordinating not to build existentially dangerous AI) is significantly higher in their preference ordering than plan B (making the current race marginally less bad) or say separately that "plan A good" and "plan B bad".
Or more directly, a lack of negative social consequences for doing a very naughty thing.
In the wild, this might instead take the form of people not actually changing their worldview, but of severing their morality from their actions (unless the action is only seen by people who share the worldview).
It is reasonable to doubt that people will really coordinate. but if you do not say that you will coordinate, you are making coordination harder. if not you, then who will enable coordination?
Perhaps because this corporation contains a lot of people with incentives (monetary/hedonic) to not really get it, or to not really support you in group discussions, and few people trying to do what you are trying to do.
Although, we’d endorse you making this cost as low as possible. There is a consistent persona that might cut through some of the vibes and lose less respect from your lab-mates (themselves working on capabilities maybe), which is to be clear that you won’t advocate internally for a voluntary thing that you wouldn’t also publicly support on all companies, and that you’ll race with them until the outside world decides to stop, and you’ll support stopping externally meanwhile. You are on their team, “if any lab must win, let it be us,” but you think this is a mad race and you’d prefer if all the labs were stopped across the globe.