I think this is a very promising method for improving the steering of LLMs. Which is great for reducing risk from model-originating harms like deception.

The flipside is that it increases misuse potential.

This is yet another possibility for the widening of the safety gap between closed-weight models with locked-down controls, and open weight models.

New Comment