You are viewing revision 1.6.0, last edited by RobertM
Reinforcement Learning from Human Feedback (RLHF) is a machine learning technique where the model's training signal uses human evaluations of the model's outputs, rather than labeled data or a ground truth reward signal.