This is a linkpost for https://ought.org/updates/2020-01-11-arguments
Is this the final update from Ought about their factored cognition experiments? (I can't seem to find anything more recent.) The reason I ask is that the experiments they reported on here do not seem very conclusive, and they talked about doing further experiments but then did not seem to give any more updates. Does anyone know the story of what happened, and what that implies about the viability of factored-cognition style alignment schemes?
Ought has written a detailed update and analysis of recent experiments on factored cognition. These are experiments with human participants and don’t involve any machine learning. The goal is to learn about the viability of IDA, Debate, and related approaches to AI alignment. For background, here are some prior LW posts on Ought: Ought: Why it Matters and How to Help, Factored Cognition presentation.
Here is the opening of the research update:
The rest of the post is here.