Context for LW audience: Ramana, Steve and John regularly talk about stuff in the general cluster of agency, abstraction, optimization, compression, purpose, representation, etc. We decided to write down some of our discussion and post it here. This is a snapshot of us figuring stuff out together. Hooks from Ramana:...
I'd like to put forward another description of a basic issue that's been around for a while. I don't know if there's been significant progress on a solution, and would be happy to pointed to any such progress. I've opted to go for a relatively rough and quick post that...
A Sharp Left Turn (SLT) is a possible rapid increase in AI system capabilities (such as planning and world modeling). This post will outline our current understanding of the most promising plan for getting through an SLT and how it could fail (conditional on an SLT occurring). In a previous...
TL;DR: This post provides a literature review of some threat models of how misaligned AI can lead to existential catastrophe. See our accompanying post for high-level discussion, a categorization and our consensus threat model. Where available we cribbed from the summary in the Alignment Newsletter. For other people's overviews of...
TL;DR: We give a threat model literature review, propose a categorization and describe a consensus threat model from some of DeepMind's AGI safety team. See our post for the detailed literature review. The DeepMind AGI Safety team has been working to understand the space of threat models for existential risk...
I think semantics – specifically, maintaining reference relationships – is a core component of intelligent behaviour. Consequently, I think a better understanding of semantics would enable a better understanding of what machine intelligence that is “trying to do the right thing” ought to look like and how to build it....
This is our current distillation of the sharp left turn threat model and an attempt to make it more concrete. We will discuss our understanding of the claims made in this threat model, and propose some mechanisms for how a sharp left turn could happen. This is a work in...