Shaping safer goals
How can we move the needle on AI safety? In this sequence I think through some approaches that don't rely on precise specifications - instead they involve "shaping" our agents to think in safer ways, and have safer motivations. This is particularly relevant to the prospect of training AGIs in multi-agent (or other open-ended) environments.
Note that all of the techniques I propose here are speculative brainstorming; I'm not confident in any of them as research directions, although I'd be excited to see further exploration along these lines.