My research priorities for AI control

paulfchristiano

I've been thinking about what research projects I should work on, and I've posted my current view. Naturally, I think these are also good projects for other people to work on as well.

Brief summaries of the projects I find most promising:

Elaborating on apprenticeship learning. Imitating human behavior seems especially promising as a scalable approach to AI control, but there are many outstanding problems.
Efficiently using human feedback. The limited availability of human feedback may be a serious bottleneck for realistic approaches to AI control.
Explaining human judgments and disagreements. My preferred approach to AI control requires humans to understand AIs’ plans and beliefs. We don’t know how to solve the analogous problem for humans.
Designing feedback mechanisms for reinforcement learning. A grab bag of problems, united by a need for proxies of hard-to-optimize, implicit objectives.

The post briefly discusses where I am coming from, and links to a good deal more clarification. I'm always interested in additional thoughts and criticisms, since changing my views on these questions would directly influence what I spend my time on.

I've been thinking about what research projects I should work on, and I've posted my current view. Naturally, I think these are also good projects for other people to work on as well.

Brief summaries of the projects I find most promising:

Elaborating on apprenticeship learning. Imitating human behavior seems especially promising as a scalable approach to AI control, but there are many outstanding problems.
Efficiently using human feedback. The limited availability of human feedback may be a serious bottleneck for realistic approaches to AI control.
Explaining human judgments and disagreements. My preferred approach to AI control requires humans to understand AIs’ plans and beliefs. We don’t know how to solve the analogous problem for humans.
Designing feedback mechanisms for reinforcement learning. A grab bag of problems, united by a need for proxies of hard-to-optimize, implicit objectives.

Minor naming feedback. You switched from calling something "supervised learning" to "reinforcement learning". The first images that come to my mind when I hear "reinforcement learning" are TD-Gammon and reward signals. So, when I read "reinforcement learning", I first think of a computer getting smarter through iterative navel-gazing, then think of a computer trying to wirehead itself, then stumble to the meaning I think you intend. I am a lay reader.

29

My research priorities for AI control

29

29

29

My research priorities for AI control

29

29