Ramana Kumar

Dialogue on What It Means For Something to Have A Function/Purpose

Context for LW audience: Ramana, Steve and John regularly talk about stuff in the general cluster of agency, abstraction, optimization, compression, purpose, representation, etc. We decided to write down some of our discussion and post it here. This is a snapshot of us figuring stuff out together. Hooks from Ramana:...

Jul 15, 202441

Consent across power differentials

I'd like to put forward another description of a basic issue that's been around for a while. I don't know if there's been significant progress on a solution, and would be happy to pointed to any such progress. I've opted to go for a relatively rough and quick post that...

Jul 9, 202452

Refining the Sharp Left Turn threat model, part 2: applying alignment techniques

A Sharp Left Turn (SLT) is a possible rapid increase in AI system capabilities (such as planning and world modeling). This post will outline our current understanding of the most promising plan for getting through an SLT and how it could fail (conditional on an SLT occurring). In a previous...

Nov 25, 202239

Threat Model Literature Review

TL;DR: This post provides a literature review of some threat models of how misaligned AI can lead to existential catastrophe. See our accompanying post for high-level discussion, a categorization and our consensus threat model. Where available we cribbed from the summary in the Alignment Newsletter. For other people's overviews of...

Nov 1, 202279

Clarifying AI X-risk

TL;DR: We give a threat model literature review, propose a categorization and describe a consensus threat model from some of DeepMind's AGI safety team. See our post for the detailed literature review. The DeepMind AGI Safety team has been working to understand the space of threat models for existential risk...

Nov 1, 2022127

Autonomy as taking responsibility for reference maintenance

I think semantics – specifically, maintaining reference relationships – is a core component of intelligent behaviour. Consequently, I think a better understanding of semantics would enable a better understanding of what machine intelligence that is “trying to do the right thing” ought to look like and how to build it....

Aug 17, 202263

Refining the Sharp Left Turn threat model, part 1: claims and mechanisms

This is our current distillation of the sharp left turn threat model and an attempt to make it more concrete. We will discuss our understanding of the claims made in this threat model, and propose some mechanisms for how a sharp left turn could happen. This is a work in...

Aug 12, 202286

Ramana Kumar

Ramana Kumar

Will Capabilities Generalise More?

Clarifying AI X-risk

Thoughts on Human Models

Refining the Sharp Left Turn threat model, part 1: claims and mechanisms

Ramana Kumar

Will Capabilities Generalise More?

Clarifying AI X-risk

Thoughts on Human Models

Refining the Sharp Left Turn threat model, part 1: claims and mechanisms

Dialogue on What It Means For Something to Have A Function/Purpose

Consent across power differentials

Refining the Sharp Left Turn threat model, part 2: applying alignment techniques

Threat Model Literature Review

Clarifying AI X-risk

Autonomy as taking responsibility for reference maintenance

Refining the Sharp Left Turn threat model, part 1: claims and mechanisms