x

kache

Subscribe

Message

26

1

4y

Would you be a better RLHF labeler than GPT-4?

Have you ever tried to label your own data? Have you ever tried to run an evaluation, yourself, on a system that you've built? It's difficult. We, humans, are imperfect. We are terrible at labeling. We are also expensive, and difficult to scale. Lets say you wanted to create a...

Mar 27, 20231

kache

Subscribe

Message

26

1

4y

kache

Would you be a better RLHF labeler than GPT-4?

Have you ever tried to label your own data? Have you ever tried to run an evaluation, yourself, on a system that you've built? It's difficult. We, humans, are imperfect. We are terrible at labeling. We are also expensive, and difficult to scale. Lets say you wanted to create a...

Mar 27, 20231

Replying toThe Company Man

kache5mo

The Company Man

The amusing thing about this story is that it's actually true. They really are getting RL systems to run auto experiments on humans, live. (You wouldn't believe the things people will tell you about what they're working on when you just ask!)

27

1

Would you be a better RLHF labeler than GPT-4?

kache

3y

Have you ever tried to label your own data? Have you ever tried to run an evaluation, yourself, on a system that you've built? It's difficult. We, humans, are imperfect. We are terrible at labeling. We are also expensive, and difficult to scale.

Lets say you wanted to create a high quality instruct dataset, today. What would generate a better output? A high resource model? Or, a farm of hired humans?

How many human RLHF labels are incorrect?

How many benchmarks are our machines now superhuman at?

If we're not bootstrapped, we will be soon.

1

LESSWRONG
LW

LESSWRONG
LW

kache

kache

kache

Would you be a better RLHF labeler than GPT-4?

kache

kache

kache

Would you be a better RLHF labeler than GPT-4?