x
The anti-psychotic Q-learner trick — LessWrong