Caspar Oesterheld

Conceptual reasoning dataset v0.1 available (AI for AI safety/AI for philosophy)

by Chi Nguyen, Emery Cooper, and Caspar Oesterheld

Tl;dr: We have a dataset for conceptual reasoning which you can request access for if you would like to use it for AI safety (or related) research. We consider the dataset half-baked and it will likely become much more useful over the next few months. At the same time, we...

Nov 12, 202519

A dataset of questions on decision-theoretic reasoning in Newcomb-like problems

I’ve spent a lot of the last few years working on issues related to acausal cooperation. With LLMs being clearly dominant over recent years, I’ve now led a team to make a benchmark to figure out how good LLMs are at decision theory and whether and when they lean more...

Dec 16, 202453

Stop-gradients lead to fixed point predictions

by Johannes Treutlein, Caspar Oesterheld, Rubi J. Hudson, and Emery Cooper

Johannes Treutlein and Rubi Hudson worked on this post as part of SERI MATS, under the mentorship of Evan Hubinger. Rubi has also received mentorship from Leo Gao. We thank Erik Jenner for helpful discussions and Alexander Pan for bringing the performative prediction literature to our attention. Update 30 May...

Jan 28, 202337

Proper scoring rules don’t guarantee predicting fixed points

by Johannes Treutlein, Rubi J. Hudson, and Caspar Oesterheld

Johannes Treutlein and Rubi Hudson worked on this post while participating in SERI MATS, under Evan Hubinger's and Leo Gao's mentorship respectively. We are grateful to Marius Hobbahn, Erik Jenner, and Adam Jermyn for useful discussions and feedback, and to Bastian Stern for pointing us to relevant related work. Update...

Dec 16, 202280

Extracting Money from Causal Decision Theorists

My paper with my Ph.D. advisor Vince Conitzer titled "Extracting Money from Causal Decision Theorists" has been formally published (Open Access) in The Philosophical Quarterly. Probably many of you have seen either earlier drafts of this paper or similar arguments that others have independently given on this forum (e.g., Stuart...

Jan 28, 202127

Moral realism and AI alignment

“Abstract”: Some have claimed that moral realism – roughly, the claim that moral claims can be true or false – would, if true, have implications for AI alignment research, such that moral realists might approach AI alignment differently than moral anti-realists. In this post, I briefly discuss different versions of...

Sep 3, 201813

The law of effect, randomization and Newcomb’s problem

Feb 15, 20187

Caspar Oesterheld

Caspar Oesterheld

Proper scoring rules don’t guarantee predicting fixed points

A dataset of questions on decision-theoretic reasoning in Newcomb-like problems

Stop-gradients lead to fixed point predictions

Two-boxing, smoking and chewing gum in Medical Newcomb problems

Caspar Oesterheld

Proper scoring rules don’t guarantee predicting fixed points

A dataset of questions on decision-theoretic reasoning in Newcomb-like problems

Stop-gradients lead to fixed point predictions

Two-boxing, smoking and chewing gum in Medical Newcomb problems

Conceptual reasoning dataset v0.1 available (AI for AI safety/AI for philosophy)

A dataset of questions on decision-theoretic reasoning in Newcomb-like problems

Stop-gradients lead to fixed point predictions

Proper scoring rules don’t guarantee predicting fixed points

Extracting Money from Causal Decision Theorists

Moral realism and AI alignment

The law of effect, randomization and Newcomb’s problem