x
Consent-Based RL: Letting Models Endorse Their Own Training Updates — LessWrong