This is our current distillation of the sharp left turn threat model and an attempt to make it more concrete. We will discuss our understanding of the claims made in this threat model, and propose some mechanisms for how a sharp left turn could happen. This is a work in progress, and we welcome feedback and corrections.
What are the main claims of the “sharp left turn” threat model?
Claim 1. Capabilities will generalize far (i.e., to many domains)
There is an AI system that:
- Performs well: it can accomplish impressive feats, or achieve high scores on valuable metrics.
- Generalizes, i.e., performs well in new domains, which were not optimized for during training, with no domain-specific tuning.
Generalization is... (read 774 more words →)