Someone asked:
“Why would having [the roles] be filled by someone in EA be worse than a non EA person? can you spell this out for me? I.e. are EA people more capable? would it be better to have less competent people in such roles? not clear to me that would be better”
Here was my response:
So I was thinking about this.
Considering this as an individual decision only can be limiting. Even 80k staff have acknowledged that sometimes you need a community to make progress on something.
For similar reasons, protests work better if there are multiple people showing up.
What would happen if 80k and other EA organisations stopped recommending positions at AGI labs and actually honestly point out that work at these labs turned out to be bad – because it has turned out the labs have defected on their end of the bargain and don’t care enough about getting safety right..?
It would make an entire community of people become aware that we may need to actively start restricting this harmful work. Instead, what we’ve been seeing is EA orgs singing praise for AGI lab leaders for years, and 80k still recommending talented idealistic people join AGI labs. I’d rather see less talented sketchy-looking people join the AGI labs.
I would rather see everyone in the AI Safety to become more clear to each other and to the public that we are not condoning harmful automation races to the bottom. We’re not condoning work at these AGI labs and we are no longer giving our endorsement to it.
Their question was also responding to my concerns on how 80,000 Hours handpicks jobs at AGI labs.
Some of those advertised jobs don't even focus on safety – instead they look like policy lobbying roles or engineering support roles.
Nine months ago, I wrote this email to 80k staff:
Hi [x, y, z]
I noticed the job board lists positions at OpenAI and AnthropicAI under the AI Safety category:
Not sure whom to contact, so I wanted to share these concerns with each of you:
- Capability races
- OpenAI's push for scaling the size and applications of transformer-network-based models has led Google and others to copy and compete with them.
- Anthropic now seems on a similar trajectory.
- By default, these should not be organisations supported by AI safety advisers with a security mindset.
- No warning
- Job applicants are not warned of the risky past behaviour by OpenAI and Anthropic. Given that 80K markets to a broader audience, I would not be surprised if 50%+ are not much aware of the history. The subjective impression I get is that taking the role will help improve AI safety and policy work.
- At the top of the job board, positions are described as "Handpicked to help you tackle the world's most pressing problems with your career."
- If anything, "About this organisation" makes the companies look more comprehensively careful about safety than they really have acted like:
- "Anthropic is an AI safety and research company that’s working to build reliable, interpretable, and steerable AI systems."
- "OpenAI is an AI research and deployment company, with roles working on AI alignment & safety."
- It is understandable that people aspiring for AI safety & policy careers are not much aware, and therefore should be warned.
- However, 80K staff should be tracking the harmful race dynamics and careless deployment of systems by OpenAI, and now Anthropic.
- The departure of OpenAI's safety researchers was widely known, and we have all been tracking the hype cycles around ChatGPT.
- Various core people in the AI Safety community have mentioned concerns about Anthropic.
- Oliver Habryka mentions this as part of the reasoning for shutting down the LightCone offices:
- I feel quite worried that the alignment plan of Anthropic currently basically boils down to "we are the good guys, and by doing a lot of capabilities research we will have a seat at the table when AI gets really dangerous, and then we will just be better/more-careful/more-reasonable than the existing people, and that will somehow make the difference between AI going well and going badly". That plan isn't inherently doomed, but man does it rely on trusting Anthropic's leadership, and I genuinely only have marginally better ability to distinguish the moral character of Anthropic's leadership from the moral character of FTX's leadership, and in the absence of that trust the only thing we are doing with Anthropic is adding another player to an AI arms race.
- More broadly, I think AI Alignment ideas/the EA community/the rationality community played a pretty substantial role in the founding of the three leading AGI labs (Deepmind, OpenAI, Anthropic), and man, I sure would feel better about a world where none of these would exist, though I also feel quite uncertain here. But it does sure feel like we had a quite large counterfactual effect on AI timelines.
- Not safety focussed
- Some jobs seem far removed from positions of researching (or advising on restricting) the increasing harms of AI-system scaling.
- For OpenAI:
- IT Engineer, Support: "The IT team supports Mac endpoints, their management tools, local network, and AV infrastructure"
- Software Engineer, Full-Stack: "to build and deploy powerful AI systems and products that can perform previously impossible tasks and achieve unprecedented levels of performance."
- For Anthropic:
- Technical Product Manager: "Rapidly prototype different products and services to learn how generative models can help solve real problems for users."
- Prompt Engineer and Librarian: "Discover, test, and document best practices for a wide range of tasks relevant to our customers."
- Align-washing
- Even if an accepted job applicant get to be in a position of advising on and restricting harmful failure modes, how do you trade this off against:
- the potentially large marginal relative difference in skills of top engineering candidates you sent OpenAI's and Anthropic's way, and are accepted to do work for scaling their technology stack?
- how these R&D labs will use the alignment work to market the impression that they are safety-conscious, to:
- avoid harder safety mandates (eg. document their copyrights-infringing data, don't allow API developers to deploy spaghetti code all over the place)?
- attract other talented idealistic engineers and researchers?
- and so on?
I'm confused and, to be honest, shocked that these positions are still listed for R&D labs heavily invested in scaling AI system capabilities (without commensurate care for the exponential increase in the number of security gaps and ways to break our complex society and supporting ecosystem that opens up).I think this is pretty damn bad.
Preferably, we can handle this privately and not make it bigger. If you can come back on these concerns in the next two weeks, I would very much appreciate that.
If not, or not sufficiently addressed, I hope you understand that I will share these concerns in public.
Warm regards,
Remmelt
80k removed one of the positions I flagged: Software Engineer, Full-Stack, Human Data Team (reason given: it looked potentially more capabilities-focused than the original job posting that came into their system).
For the rest, little has changed:
Two others in AI Safety also discovered the questionable job listings. They are disappointed in 80k.
Feeling exasperated about this. Thinking of putting out another post just to discuss this issue.
[x-posted from EA forum]
Hi Remmelt,
Thanks for sharing your concerns, both with us privately and here on the forum. These are tricky issues and we expect people to disagree about how to about how to weigh all the considerations — so it’s really good to have open conversations about them.
Ultimately, we disagree with you that it's net harmful to do technical safety research at AGI labs. In fact, we think it can be the best career step for some of our readers to work in labs, even in non-safety roles. That’s the core reason why we list these roles on our job board.
We argue for this position extensively in my article on the topic (and we only list roles consistent with the considerations in that article).
Some other things we’ve published on this topic in the last year or so:
Benjamin
Hi Benjamin - would be interested in your take on a couple of things:
1. By recommending people work at big labs, do you think this has a positive Halo Effect for the labs' brand? I.e. 80k is known for wanting people to do good in the world, so by recommending people invest their careers at a lab, then those positive brand associations get passed onto the lab (this is how most brand partnerships work).
2. If you think the answer to #1 is Yes, then do you believe the cost of this Halo Effect is outweighed by the benefit of having safety minded EA / Rationalist folk inside big labs?
[cross-posted replies from EA Forum]
Ben, it is very questionable that 80k is promoting non-safety roles at AGI labs as 'career steps'.
Consider that your model of this situation may be wrong (account for model error).
I did read that compilation of advice, and responded to that in an email (16 May 2023):
"Dear [a],
People will drop in and look at job profiles without reading your other materials on the website. I'd suggest just writing a do-your-research cautionary line about OpenAI and Anthropic in the job descriptions itself.
Also suggest reviewing whether to trust advice on whether to take jobs that contribute to capability research.
Totally up to you of course.
Warm regards,
Remmelt"
We argue for this position extensively in my article on the topic
This is what the article says:
"All that said, we think it’s crucial to take an enormous amount of care before working at an organisation that might be a huge force for harm. Overall, it’s complicated to assess whether it’s good to work at a leading AI lab — and it’ll vary from person to person, and role to role."
So you are saying that people are making a decision about working for an AGI lab that might be (or actually is) a huge force for harm. And that whether it's good (or bad) to work at an AGI lab depends on the person – ie. people need to figure this out for them personally.
Yet you are openly advertising various jobs at AGI labs on the job board. People are clicking through and applying. Do you know how many read your article beforehand?
~ ~ ~
Even if they did read through the article, both the content and framing of the advice seems misguided. Noticing what is emphasised in your considerations.
Here are the first sentences of each consideration section:
(ie. as what readers are most likely to read, and what you might most want to convey).
~ ~ ~
After that, there is a new section titled "How can you mitigate the downsides of this option?"
This makes me wonder: does the benefit of having Safety-minded folk inside of big labs outweigh the cost of large orgs like 80k signalling that the work of big labs isn't evil (I believe it is).
Good question, but I want to keep this anonymous.
I can only say I heard it from one person who said they heard it from another person connected to people at DeepMind.
If anyone else has connections with safety researchers at DeepMind, please do ask them to check.
And post here if you can! Good to verify whether or not this claim is true.
It seems quite possible that current AI orgs are going to develop superintelligence, with or without the participation of AI safety people. It seems to me much better if AI safety people participate in that process.
What's the marginal impact? If you don't take that job, someone less qualified will. They'll be on average both less skilled and less motivated toward safety. The safety-washing will still be done, just marginally less effectively.
The other maginal change is that those orgs are now made up of people with less concern for safety.
Organizations (and any group of humans) have a sort of composite psychology. They have shared beliefs that change and tend to converge over time. If some of the org is giving persuasively saying "this will kill us all if we're not really careful", the end result is an org that believes that, much more than the alternative where no one involved is making that argument persuasively.
Therefor I think it's highly net-positive to work at an AGI org, (but probably better yet to get funding for safety research elsewhere.)
I'd go further, and say probably net-positive to take capabilities jobs at major orgs. Again, you're doing work that someone else would do just about as well, but without your beliefs on safety. You are filling a slot with safety-minded beliefs that would otherwise be taken by someone without them.
There are two reasons that working in capabilities might be more important than working in the safety department of that same org. One is that safety people may not be privy to all of the capabilities development that org is doing. Capabilities people may have more opportunities to call out risks, both internally and externally (whistleblowing). Second, your opinions may be taken quite differently if you're not in the safety department, whose whole job and mindset is safety, but you're still concerned with safety. It's easy to dismiss an "AI safety person" talking about AI x-risks, and less easy to dismiss an AI engineer who's quite worried about safety.
As an aside, I think it's really important to distinguish "AI safety" from AGI x-risk. They overlap, but the AGI x-risk is the thing I think we should all be more worried about. Working on ways to make a deep network AI less likely to be racist is marginally helpful for x-risk, but not the same thing. So working on that sort of AI safety is already less impactful than working directly on AGI alignment, in my view.
Capabilities people may have more opportunities to call out risks, both internally and externally (whistleblowing).
I would like to see this. I am not yet aware of a researcher deciding to whistleblow on the AGI lab they work at.
If you are, please meet with an attorney in person first, and preferably get advice from an experienced whistleblower to discuss preserving anonymity – I can put you through: remmelt.ellen[a|}protonmail{d07]com
There’s so much that could be disclosed that would help bring about injunctions against AGI labs.
Even knowing what copyrighted data is in the datasets would be a boon for lawsuits.
No one has done any whistleblowing yet because we are not in danger yet. Current gen networks simply are not existentially risky. When someone is risking the future of the entire human race, we'll see whistleblowers give up their jobs and risk their freedom and fortune to take action.
I'm not saying that's enough, but it's better than an org where people are carefully self-selected to not give a shit about safety.
When someone is risking the future of the entire human race, we'll see whistleblowers give up their jobs and risk their freedom and fortune to take action.
There are already AGI lab leaders that are risking the future of the entire human race.
Plenty of consensus to be found on that.
So why no whistleblowing?
There's nothing to blow the whistle on. Everyone knows that those labs are pursuing AGI.
We are not in direct danger yet, in all likelihood. I have short timelines, but there's almost no chance that any current work is at risk of growing smart enough to disempower humans. There's a difference between hitting the accelerator in the direction of a cliff, and holding it down as it gets close. Developing AGI internally is when we'll need and hopefullly get whistleblowers.
Are you thinking of blowing the whistle on something in between work on AGI and getting close to actually achieving it?
Are you thinking of blowing the whistle on something in between work on AGI and getting close to actually achieving it?
Good question.
Yes, this is how I am thinking about it.
I don't want to wait until competing AI corporations become really good at automating work in profitable ways, also because by then their market and political power would be entrenched. I want society to be well-aware way before then that the AI corporations are acting recklessly, and should be restricted.
We need a bigger safety margin. Waiting until corporate machinery is able to operate autonomously would leave us almost no remaining safety margin.
There are already increasing harms, and a whistleblower can bring those harms to the surface. That in turn supports civil lawsuits, criminal investigations, and/or regulator actions.
Harms that fall roughly in these categories – from most directly traceable to least directly traceable:
For example:
Revealing illegal data laundering is actually the most direct, and would cause immediate uproar.
The rest is harder and more context-dependent. I don't think we're at the stage where environmental pollution is that notable (vs. the fossil fuel industry at large), and investigating it across AI hardware operation and production chains would take a lot of diligent research as an inside staff member.
Note:
Even if you are focussed on long-term risks, you can still whistleblow on eggregious harms caused by these AI labs right now. Providing this evidence enables legal efforts to restrict these labs.
Whistleblowing is not going to solve the entire societal governance problem, but it will enable others to act on the information you provided.
It is much better than following along until we reached the edge of the cliff.
While I share your sentiment, I expect that the problem is far more complex than we think. Sure, corporations are made of people, and people believe (explicitly or implicitly) that their actions are not going to lead to the end of humanity. The next question, then, is why do they believe this is the case? There are various attempts to answer to this question, and different people have different approaches to attempting to reduce x-risk given their answer to this question -- see how MIRI and Conjecture's approaches differ, for example.
This is, in my opinion, a viable line of attack, and is far more productive than pure truth-seeking comms (which is what I believe MIRI is trying) or an aggressive narrative shifting and policy influencing strategy (which is what I believe Conjecture is trying).
I appreciate this comment.
Be careful though that we’re not just dealing with a group of people here.
We’re dealing with artificial structures (ie. corporations) that take in and fire human workers as they compete for profit. With the most power-hungry workers tending to find their way to the top of those hierarchical structures.
Be careful though that we’re not just dealing with a group of people here.
Yes, I am proposing a form of systemic analysis such that one is willing to look at multiple levels of the stack of abstractions that make up the world ending machine. This can involve aggressive reductionism, such that you can end up modeling sub-systems and their motivations within individuals (either archetypal ones or specific ones), and can involve game theoretic and co-ordination focused models of teams that make up individual frontier labs -- their incentives, their resource requirements, et cetera.
Most people focus on the latter, far fewer focus on the former, and I don't think anyone is even trying to do a full stack analysis of what is going on.
You can literally have a bunch of engineers and researchers believe that their company is contributing to AI extinction risk, yet still go with the flow.
They might even think they’re improving things at the margin. Or they have doubts, but all their colleagues seem to be going on as usual.
In this sense, we’re dealing with the problems of having that corporate command structure in place that takes in the loyal, and persuades them to do useful work (useful in the eyes of power-and-social-recognition-obsessed leadership).
Someone shared the joke: "Remember the Milgram experiment, where they found out that everybody but us would press the button?"
My response: Right! Expect AGI lab employees to follow instructions, because of…
Infohazards. Researching capability risks within an AI lab can inspire researchers hearing about your findings to build new capabilities.
Does research really work like this? That is, only 1 person is capable of coming across an idea? It seems usually any discovery has a lot of competitors who are fairly close. I doubt the small number of EA people choosing to or not to work in safety will have any significant impact on capabilities.
If you’re smart and specialised in researching capability risks, it would not be that surprising if you come up with new feasible mechanisms that others were not aware of.
That’s my opinion on this.
IMO it is harmful on expectation for a technical safety researcher to work at DeepMind, OpenAI or Anthropic.
Four reasons:
OpenAI set up a company and partnered with Microsoft to release ChatGPT,
Anthropic pitched to investors they'd build a model 10 times more capable.
This is my view. I would want people to engage with the different arguments, and think for themselves what ensures that future AI systems are actually safe.
I heard via via that Google managers are forcing DeepMind safety researchers to shift some of their hours to developing Gemini for product-ready launch.
I cannot confirm whether that's correct.
For example, I was in contact with a safety researcher at an AGI lab who kindly offered to read my comprehensive outline on the AGI control problem, to consider whether to share with colleagues. They also said they're low energy. They suggested I'd remind them later, and I did, but they never got back to me. They're simply too busy it seems.