Would you be a better RLHF labeler than GPT-4?
Have you ever tried to label your own data? Have you ever tried to run an evaluation, yourself, on a system that you've built? It's difficult. We, humans, are imperfect. We are terrible at labeling. We are also expensive, and difficult to scale. Lets say you wanted to create a...
The amusing thing about this story is that it's actually true. They really are getting RL systems to run auto experiments on humans, live. (You wouldn't believe the things people will tell you about what they're working on when you just ask!)