sbenthall — LessWrong

Reward Hacking from a Causal Perspective

Post 4 of Towards Causal Foundations of Safe AGI, preceded by Post 1: Introduction, Post 2: Causality, Post 3: Agency, and Post 4: Incentives. By Francis Rhys Ward, Tom Everitt, Sebastian Benthall, James Fox, Matt MacDermott, Milad Kazemi, Ryan Carey representing the Causal Incentives Working Group. Thanks also to Toby...

Jul 21, 202329

Incentives from a causal perspective

Post 4 of Towards Causal Foundations of Safe AGI, preceded by Post 1: Introduction, Post 2: Causality, and Post 3: Agency. By Tom Everitt, James Fox, Ryan Carey, Matt MacDermott, Sebastian Benthall, and Jon Richens, representing the Causal Incentives Working Group. Thanks also to Toby Shevlane and Aliya Ahmad. “Show...

Jul 10, 202327

Causality: A Brief Introduction

Post 2 of Towards Causal Foundations of Safe AGI, see also Post 1 Introduction. By Lewis Hammond, Tom Everitt, Jon Richens, Francis Rhys Ward, Ryan Carey, Sebastian Benthall, and James Fox, representing the Causal Incentives Working Group. Thanks also to Alexis Bellot, Toby Shevlane, and Aliya Ahmad. Causal models are...

Jun 20, 202349

Introduction to Towards Causal Foundations of Safe AGI

By Tom Everitt, Lewis Hammond, Rhys Ward, Ryan Carey, James Fox, Sebastian Benthall, Matt MacDermott and Shreshth Malik representing the Causal Incentives Working Group. Thanks also to Toby Shevlane, MH Tessler, Aliya Ahmad, Zac Kenton, Maria Loks-Thompson, and Alexis Bellot. Over the next few years, society, organisations, and individuals will...

Jun 12, 202374

Don't Fear the Reaper: Refuting Bostrom's Superintelligence Argument

I've put a preprint up on arXiv that this community might find relevant. It's an argument from over a year ago, so it may be dated. I haven't been keeping up with the field much since I wrote it, so I welcome any feedback especially on where the crux of...

Mar 1, 20179

Autonomy, utility, and desire; against consequentialism in AI design

For the sake of argument, let's consider an agent to be autonomous if: * It has sensors and actuators (important for an agent) * It has an internal representation of its goals. I will call this internal representation its desires. * It has some kind of internal planning function that...

Dec 3, 20147

more on predicting agents

Suppose you want to predict the behavior of an agent. I stand corrected. To make the prediction, as a predictor you need: * observations of the agent * the capacity to model the agent to a sufficient degree of accuracy "Sufficient accuracy" here is a threshold on, for example, KL...

Nov 8, 20141