The AGI Safety & Alignment Team (ASAT) at Google DeepMind (GDM) is hiring! Please apply to the Research Scientist and Research Engineer roles. Strong software engineers with some ML background should also apply (to the Research Engineer role). Our initial batch of hiring will focus more on hiring engineers, but we expect to continue to use the applications we receive for future hiring this year, which we expect will be more evenly split. Please do apply even if e.g. you’re only available in the later half of this year.

What is ASAT?

ASAT is the primary team at GDM focused on technical approaches to severe harms from AI systems, having evolved out of the Alignment and Scalable Alignment teams. We’re organized around two themes: AGI Alignment (think amplified oversight and mechanistic interpretability) and Frontier Safety (think development and implementation of the Frontier Safety Framework). The leadership team is Anca Dragan, Rohin Shah, Allan Dafoe, and Dave Orr, with Shane Legg as executive sponsor.

Why should you join?

I’d say there are three main ways in which work on the GDM safety team is especially impactful:

  1. GDM is one of the most likely places to develop AGI[1], and cares about AGI safety, so it is especially important to implement AGI safety mitigations at GDM.
  2. Developing and preparing mitigations enables Google to discuss them publicly in the Frontier Safety Framework (FSF), which in turn can help build norms and policy about what mitigations frontier developers should put in place to address AGI risks. For example, our updated FSF is the first policy to address deceptive alignment.
  3. Since we have access to frontier models and substantial amounts of compute, we can do safety research that would be hard to do anywhere other than a frontier lab.

Despite our small size relative to the size of Google, our team is responsible for setting an AGI safety approach that applies at Google’s massive scale (via the FSF). This is a big deal – actions taken by Google will typically have greater policy impact than the same actions taken by smaller frontier labs.

By far our biggest resource bottleneck is people, so new hires should expect to make a significant difference to our impact.

GDM is also a great place to learn and upskill – we’re surrounded by incredibly strong researchers and engineers, both for safety and ML more broadly.

Also, while everyone says this, I really do think our team has a great culture. Team members know the point of the project they’re working on; any team member can raise an objection and they will be listened to. People are incredibly helpful and generous with their time. At least one person who joined us with 10+ years of industry experience finds it the best culture they've been in.

What will we do in the near future?

Half a year ago, we published an overview of our recent research. This should give you a decent sense of the type of work we plan to do in the future as well. The biggest change relative to that post is that we’re planning to work a lot on monitoring, particularly chain-of-thought monitoring, which we think of as a near-term example of AI control.

Here are a few concrete things I hope for the team to accomplish by the end of 2025, to give a sense of what you’d be contributing to:

  1. Publish a GDM roadmap for AGI safety, that extends beyond the level of capabilities addressed by the FSF (though not all the way to superintelligence). We have already developed a draft roadmap internally that we use for research planning.
  2. Develop mitigations and produce a sketch of a mitigation-based safety case for misuse, that can be quickly concretized once a dangerous capability actually arises.
  3. Develop more evaluations for instrumental reasoning (e.g. self-reasoning, stealth) and monitors for deceptive alignment (see the FSF), ideally exploring both black box methods and methods based on model internals. Produce a sketch of a monitoring-based safety case for deceptive alignment, that can be quickly concretized once models gain sufficient capabilities at instrumental reasoning.
  4. Do a deep dive on externalized reasoning (similar to “CoT faithfulness”) as it relates to monitoring, particularly focusing on how it may break in the future and what can be done to avoid that.[2]
  5. Demonstrate that some flavor of debate outperforms strong baselines in a realistic, challenging task. One source of unrealism is allowed: methods may be restricted to use feedback from an artificially weak source to create a sandwiching setup.
  6. Develop and be ready to deploy mitigations based on model internals that outperform their behavioral equivalents (which may just be on latency / cost, rather than accuracy).

I doubt we’ll succeed at all of these. Perhaps I’d guess we’ll succeed at 4-5 of them in spirit (i.e. ignoring minor deviations from the letter of what we wrote).

We’ll also do a few other things not on this list. For example, we expect to improve our approach to preparing for the automation of ML R&D, but we don’t yet know what that will look like, so it was hard to write down as concrete an outcome as we did for the other items on the list. And of course there will be new things that we work on that I haven’t yet anticipated.

How do you prioritize across research topics?

Generally, we don’t assign people to work on particular topics. Instead, team members can choose what they work on, as long as they can convince me (Rohin) that their project has a decent theory of change, and they can find enough collaborators that the project will move forward at a reasonable pace. (This is somewhat less true on Frontier Safety, where there is somewhat more assignment of people to particular tasks.)

As a result, there isn’t a clean answer to “how do you prioritize”, since prioritization depends on the expertise, skills and views of individuals on the team, and is effectively based on an implicit aggregation of a variety of views on the team about what work is impactful that is hard to reify.

Nonetheless, I can say a bit about how I personally think about prioritization across high-level projects. As a completely made up number, I’d guess that my views drive roughly 50% of overall prioritization on the team (through a combination of formal authority, convincing team members of my views, and deference).

Roofshots. An important part of my view is that there’s a lot of “simple” or “obvious” work to be done that buys significant safety, where it is primarily important to foresee that the work is needed, and execute well on it. So rather than aiming for research breakthroughs (“moonshots”), I see our job as primarily about executing well at “roofshots”.

(Note that I view work on MONA and debate as a series of roofshots – I’m very much not saying “just do some evals and YOLO it”.)

I expect that if we consistently achieve roofshots, that will in aggregate go beyond what a moonshot would have achieved, in less time than it would take to produce a moonshot. This seems like the default way in which impressive progress happens in most fields (see e.g. Is Science Slowing Down?).

Comparative advantage. My general heuristic is that our research should take advantage of one or both of our two main comparative advantages:

  1. GDM integration: Areas where it is fairly clear that we will want the research to be integrated into GDM practice at some point. This doesn’t mean it has to be integrated now, but the work should at least be done with an eye towards integration in the future.
  2. Lab advantages: Research directions that leverage significant lab advantages, e.g. because they're very compute intensive, require access to the weights of the best models, benefit from confidential knowledge about the research frontier, etc.

I used to have the view that we should just work on whatever seemed most important and not worry too much about the factors above, since we hire some of the most talented people and can do a better job than most other groups. I still believe the latter part – for example, many have tried to explain why grokking happens, but I think our explanation is the best; similarly many investigated unsupervised knowledge discovery as an empirical AGI safety technique, and I think our paper provided the most decision-relevant evidence on the subject (except possibly the ELK report).

However, I’ve changed my mind on the overall view, because there’s quite a lot of important work to be done in the two buckets above, and other work doesn’t look massively more important, such that we really do want to get the gains from trade available by focusing on comparative advantages.

Now, when someone on ASAT wants to do important work that doesn’t fall in one of the two buckets, I’m more likely to recommend an external collaboration or MATS mentoring. Around 10 team members do substantial external mentoring. Over the last year, they’ve supervised ~50 external researchers, producing ~25 papers.

FAQ

Q. Does GDM take AGI safety seriously?

Rather than having to take our word for it, we think there is significant public evidence.

DeepMind was founded with an AGI safety mission. Its leadership endorsed the importance of AGI safety when DeepMind was founded (see posts), and continues to do so (see CAIS statementrecent podcast, and discussion of the AI Action Summit).

(People sometimes suggest that frontier labs invest in AGI safety as a form of safety washing, with upsides like dissuading regulation or attracting EA talent and funding. This hypothesis fails to retrodict the history of DeepMind. DeepMind was founded in 2010, a time when AGI safety was basically just SIAI + FHI, and “effective altruism” hadn’t been coined yet. The founders were interested in AGI safety even then, when it was clearly bad for your prospects to be visibly associated with AGI safety.)

DeepMind has had an AGI safety team since 2016, and has continually supported the team in growing over time. ML researchers are not cheap, and nor is the compute that they use. I’m pretty unsure whether Open Philanthropy has spent more on technical AGI safety than Google has spent on its technical AGI safety team.[3]

I think the more relevant issues are things like “there are many stakeholders and not all of them take AGI safety seriously” or “there are constant pressures and distractions from more immediately pressing things, and so AGI safety is put on a backburner”. These are obviously true to at least some degree, and the question is more about quantitatively how rough the effects are.

One clear piece of evidence here is that Google (not just GDM) has published and updated the Frontier Safety Framework (FSF), with the first version preceding the Seoul AI Commitments. Google is not a startup – it’s not like we just got a quick approval from Demis and Shane, and voila, now the FSF could be published. We did a lot of stakeholder engagement. If GDM didn’t take AGI safety seriously, then (at least prior to the Seoul AI Commitments) the relevant stakeholders would have ignored us and the FSF would not have seen the light of day.

Q. Isn’t GDM incredibly bureaucratic, stifling all productivity?

While there is non-zero truth to this, I think this has been greatly overstated in the safety community. We published an overview of our work over ~1.5 years – you can judge for yourself how that compares to other labs. My sense is that, compared to the other AI labs, our productivity-per-person looks similar or better.[4] Personally, I like our work more, though since I have a lot of influence over what work we do, of course I would say that.

Don’t get me wrong – there is bureaucracy, and sometimes it tries to block things for silly reasons. If it’s important, we escalate to get the right decision instead. This is often salient because it is annoying, but it is not actually a major cost to our productivity, and doesn’t happen that often to any given researcher.

Besides being annoying, another cost of bureaucracy is that it adds significant serial time / delays, but that is not nearly as bad as it would be if we took a significant hit to productivity, as we can do other projects in parallel.

Q. My worry is that the engineering infrastructure is bad.

This seems wrong to me. I think the engineering infrastructure is very good, if compared to realistic alternatives.

It’s true that, compared to my PhD, the iteration cycles at GDM are longer and the libraries used are more often broken. By far the biggest reason is that in my PhD I didn’t do research that involved massive amounts of compute. For low-compute research on tiny models that fit on a single GPU, yes, it would be faster to do the work using external infrastructure. To steal a common Google phrase, we don’t know how to count that low. Another way of saying this is that Google makes everything medium hard – both things that are normally easy, but also things that are normally impossible.

In cases where we are doing this kind of research, we do aim to use external infrastructure, at least for the early validation phase of a project to gain iteration speed benefits. But we also take this as another reason to focus on high-compute research – our comparative advantage at it is higher than you might guess at first.

I expect the “everything is always at least medium-hard” effect also applies at least somewhat to other labs’ infra. When you are parallelizing across multiple chips, the infra necessarily becomes more complicated and harder to use. When you are working with giant amounts of compute that form significant fractions of expenditure, choices will be made that sacrifice researcher time to achieve more efficient compute usage.

Since GDM reuses Google’s production tooling, there are some aspects that really don’t make sense for research. But GDM is investing in research tooling (and we can feel these gains). One particular advantage is that Google has teams for the entire stack all the way down to the hardware (TPUs), so for basically any difficulty you encounter there will be a team that can help. ASAT also has a small engineering team that supports infra for ASAT researchers in particular. 

(Incidentally, this is one of the subteams we’re hiring for! There’s a lot of room for ambitious creative problem solving to speed up alignment research building on one of the most sophisticated and large scale eng stacks in the world. Apply to the Research Engineer role.)

Also, I’ll again note that our productivity relative to other labs looks pretty good, so I feel like it would be quite surprising if GDM infra was a huge negative hit to productivity.

Q. My worry is that GDM safety doesn’t have enough access to compute.

None of our current projects are bottlenecked by compute, and I don’t expect that to change in the foreseeable future. It’s not completely unimportant – as is almost always true, more compute would help. However, we are much much more people-constrained than compute-constrained.[5]

Q. I have a question not covered elsewhere.

Leave a comment on this post! Please don’t email us individually; we get too many of these and don’t have the capacity to reply to each one.

Apply now!

We will keep the application form open until at least 12pm PST Friday 28th February 2025. Please do apply even if your desired start date is quite far in the future, as we probably will not run another public hiring round this year. Most roles can be based in San Francisco, Mountain View, London, or maybe New York, with a hybrid work-from-office / work-from-home model.[6]

While we do expect these roles to be competitive, we have found that people often overestimate what we are looking for. In particular:

  • We do not expect you to have a PhD if you are applying for the Research Engineer or Software Engineer roles. Even for the Research Scientist role, it is fine if you don’t have a PhD if you can demonstrate comparable research skill.
  • We do not expect you to have read hundreds of blog posts and papers about AI alignment, or to have a research agenda that aims to fully solve AI alignment. We will look for understanding of the basic motivation for AI alignment, and the ability to reason conceptually about future AI systems that we haven’t yet built.
    • If we ask you, say, whether an assistive agent would gradient hack if it learned about its own training process, we’re looking to see how you go about thinking about a confusing and ill-specified question (which happens all the time in alignment research). We aren’t expecting you to give us the Correct Answer, and in fact there isn’t a correct answer; the question isn’t specified well enough for that. We aren’t even expecting you to know all the terms; it would be fine to ask what “gradient hacking” is.
  • You can read my career FAQ if you’re interested in more thoughts on what skills are important and how to develop them.

Go forth and apply!

  1. ^

    Or TAI, prepotent AI, TEDAI – just insert your preferred term here.

  2. ^

    I will count it as a success if we produce the evidence base, even if it then turns out that we actually can’t leverage this into impact on safety, unless it seems like we really should have predicted that outcome in advance.

  3. ^

    I briefly skimmed this list of Open Phil’s grants in AI. Note that by focusing on technical AGI safety, I’m excluding a lot of Open Philanthropy’s funding for AI more broadly, such as their two biggest grants in AI: $55m to CSET (AI governance) and $30m to OpenAI (based on the grant page, seems to me to be an investment in oversight and influence of OpenAI, rather than in technical AGI safety research). Similarly, for Google, I restricted it just to the operating costs for the team of people motivated by AGI safety in particular, and excluded lots of other things, such as other teams doing clearly relevant research (e.g. robustness, RLHF improvements, etc), other teams doing AI governance (e.g. Allan Dafoe’s team), support for external testers, and programs like the AI Safety Fund.

  4. ^

    This is hard to assess for a variety of reasons. As a very dumb baseline approach, over 2023-2024 GDM and Anthropic produced similar numbers of papers on AGI safety, while OpenAI produced fewer papers. Anthropic likely has a substantially larger safety team than GDM, while OpenAI’s team size has varied drastically but was probably similar to GDM team size before people started to leave. So the dumb baseline would suggest that GDM was overall more productive per person. Of course, it is important to also assess paper quality – I don’t think this changes the assessment much, but different people will have different takes here. And even beyond this, to assess productivity, you would ideally want to know about how much non-public work happens at all of the labs, which for obvious reasons is hard to estimate.

  5. ^

    More concretely, for 0 < X < 25, I expect an X% increase in people would be about 10x as impactful as an X% increase in compute. (For higher values of X, if it happens in a short enough time frame, the marginal person can become negative. You don’t want to grow too fast.)

  6. ^

    Zurich is on the job listing, but this only applies to Gemini Safety, not ASAT.

New Comment
19 comments, sorted by Click to highlight new comments since:
[-]Neel NandaΩ4104

In my incredibly biased opinion, the GDM AGI safety team is great and an effective place to work on reducing AI x-risk, and I would love to get applications from people here

2 years ago, you seemed quite optimistic about AGI Safety/Alignment and had a long timeline.
Have your views changed since then?
I understand that hiring will be necessary in any case.

Still pretty optimistic by the standards of the AGI safety field, somewhat shorter timelines than I reported in that post.

Neither of these really affect the work we do very much. I suppose if I were extremely pessimistic I would be doing something else, but even at a p(doom) of 50% I'd do basically the same things I'm doing now.

(And similarly individual team members have a wide variety of beliefs on both optimism and timelines. I actually don't know their beliefs on those topics very well because these beliefs are usually not that action-relevant for us.)

[-]Rayk20

Hi Rohin, thank you for writing up this informative post! I'm working on my application now and I was wondering for the question whether I prefer to work on Gemini or ASAT, are there different interview processes for them? Because in both of them there are different areas that I find really interesting and think that I could be a good fit for. Can my resume/letter be considered for either team through one application or should I submit one for each?

There are different interview processes. ASAT is more research-driven while Gemini Safety is more focused on execution and implementation. If you really don't know which of the two teams would be a better fit, you can submit a separate application for each.

thank you for the post! Are you able to share roughly what the interview/hiring process will be like? E.g. how many stages, duration, etc

Since we have multiple roles, the interview process varies across candidates, but usually it would have around 3 stages that in total correspond to 4-8 hours of interviews.

[-]jr10

Thanks again for sharing this opportunity Rohin! Do you know when applicants can expect to hear back about interviews?

We've got a lot of interest, so it's taking some time to go through applications. If you haven't heard back by the end of March, please ping me; hopefully it will be sooner than that.

The application requires us to either pick Gemini Safety or ASAT as a team. There are areas in both teams that I could see my research and professional background being relevant to, would it possible for us to express interest in both sides if we make it clear in the application which we would like to be for?

The answer to that question will determine which team will do the first review of your application. (We get enough applications that the first review costs quite a bit of time, so we don't want both teams to review all applications separately.)

You can still express interest in both teams (e.g. in the "Any other info" question), and the reviewer will take that into account and consider whether to move your application to the other team, but Gemini Safety reviewers aren't going to be as good at evaluating ASAT candidates, and vice versa, so you should choose the team that you think is a better fit for you.

Having such a clear FAQ and LLM or bot-proof questionnaire in the job application is unique! I was curious about the choice of prioritizing engineers to hire this cycle as opposed to core researchers (e.g. adopting principled frameworks to solve safety issues). This feels like putting safety research as an afterthought on evaluations of these models rather than coming up with principled methodologies for integrating safety into the models! I may be wrong in this interpretation, so hoping for the better to get through the HRs in the given job application:)

Our hiring this round is a small fraction of our overall team size, so this is really just correcting a minor imbalance, and shouldn't be taken as reflective of some big strategy. I'm guessing we'll go back to hiring a mix of the two around mid-2025.

[-]jr10

Thank you for posting those here Rohin. Your work aligns with my values, passions, and strengths more than anything else I’m aware of, and I would love to be a part of it.

Would you be willing to help me identify whether and how I might be able to position myself as an attractive candidate for either of these roles? Or if that’s not a realistic immediate possibility, perhaps recommend an alternative role I might be a stronger candidate for which would allow me to support the same objectives?

I regret having to ask for help, especially without having established trust and credibility within this community already. However, if you’re feeling generous, I would really appreciate your insights. 

You can check out my career FAQ, as well as various other resources linked from there.

Roughly how much capability research is being done in Google vs how much AGI safety research?

More capability research than AGI safety research but idk what the ratio is and it's not something I can easily find out

Thanks for this post!

The deadline possibly requires clarification:

We will keep the application form open until at least 11:59pm AoE on Thursday, February 27.

In the job posting, you write:

Application deadline: 12pm PST Friday 28th February 2025

We'll leave it up until the later of those two (and probably somewhat beyond that, but that isn't guaranteed). I've edited the post.

Curated and popular this week