x

LESSWRONG

LW

Vika — LessWrong

Vika

Top postsTop post

Vika

Message

Victoria Krakovna. Research scientist at DeepMind working on AI safety, and cofounder of the Future of Life Institute. Website and blog: vkrakovna.wordpress.com

3280

Ω

824

56

215

1

15y

Vika

Victoria Krakovna. Research scientist at DeepMind working on AI safety, and cofounder of the Future of Life Institute. Website and blog: vkrakovna.wordpress.com

Top postsTop post

DeepMind alignment team opinions on AGI ruin arguments

We had some discussions of the AGI ruin arguments within the DeepMind alignment team to clarify for ourselves which of these arguments we are most concerned about and what the implications are for our work. This post summarizes the opinions of a subset of the alignment team on these arguments. Disclaimer: these are our own opinions that do not represent the views of DeepMind as a whole or its broader community of safety researchers. This doc shows opinions and comments from 8 people on the alignment team (without attribution). For each section of the list, we show a table summarizing agreement / disagreement with the arguments in that section (the tables can be found in this sheet). Each row is sorted from Agree to Disagree, so a column does not correspond to a specific person. We also provide detailed comments and clarifications on each argument from the team members. For each argument, we include a shorthand description in a few words for ease of reference, and a summary in 1-2 sentences (usually copied from the bolded parts of the original arguments). We apologize for some inevitable misrepresentation of the original arguments in these summaries. Note that some respondents looked at the original arguments while others looked at the summaries when providing their opinions (though everyone has read the original list at some point before providing opinions). A general problem when evaluating the arguments was that people often agreed with the argument as stated, but disagreed about the severity of its implications for AGI risk. A lot of these ended up as "mostly agree / unclear / mostly disagree" ratings. It would have been better to gather two separate scores (agreement with the statement and agreement with implications for risk). Summary of agreements, disagreements and implications Most agreement: * Section A ("strategic challenges"): #1 (human level is nothing special), #2 (unaligned superintelligence could easily take over), #8 (capabilities generaliz

397Aug 12, 2022

Possible takeaways from the coronavirus pandemic for slow AI takeoff

135May 31, 2020

[Linkpost] Some high-level thoughts on the DeepMind alignment team's strategy

Clarifying AI X-risk

Access to agent CoT makes monitors vulnerable to persuasion

This research was completed for London AI Safety Research (LASR) Labs 2025 by Jennifer Za, Julija Bainiaskina, Nikita Ostrovsky and Tanush Chopra. The team was supervised by Victoria Krakovna (Google DeepMind). Find out more about the programme and express interest in upcoming iterations here. Introduction Many proposals for controlling misaligned...

Jul 25, 2025•18

Evaluating and monitoring for AI scheming

As AI models become more sophisticated, a key concern is the potential for “deceptive alignment” or “scheming”. This is the risk of an AI system becoming aware that its goals do not align with human instructions, and deliberately trying to bypass the safety measures put in place by humans to...

Jul 10, 2025•53

A short course on AGI safety from the GDM Alignment team

We are excited to release a short course on AGI safety for students, researchers and professionals interested in this topic. The course offers a concise and accessible introduction to AI alignment, consisting of short recorded talks and exercises (75 minutes total) with an accompanying slide deck and exercise workbook. It...

Feb 14, 2025•105

Moving on from community living

After 7 years at Deep End (and 4 more years in other group houses before that), Janos and I have moved out to live near a school we like and some lovely parks. The life change is bittersweet - we will miss living with our friends, but also look forward...

Apr 17, 2024•64

When discussing AI risks, talk about capabilities, not intelligence

Public discussions about catastrophic risks from general AI systems are often derailed by using the word “intelligence”. People often have different definitions of intelligence, or associate it with concepts like consciousness that are not relevant to AI risks, or dismiss the risks because intelligence is not well-defined. I would advocate...

Aug 11, 2023•124

[Linkpost] Some high-level thoughts on the DeepMind alignment team's strategy

Update: The original title "DeepMind alignment team's strategy" was poorly chosen. Some readers seem to have interpreted the previous title as meaning that this was everything that we had thought about or wanted to say about an "alignment plan", which is an unfortunate misunderstanding. We simply meant to share slides...

Mar 7, 2023•128

Power-seeking can be probable and predictive for trained agents

Power-seeking is a major source of risk from advanced AI and a key element of most threat models in alignment. Some theoretical results show that most reward functions incentivize reinforcement learning agents to take power-seeking actions. This is concerning, but does not immediately imply that the agents we train will...

Feb 28, 2023•56

Load More (7/57)