LESSWRONG
is fundraising!
Tags
LW
$

Scalable Oversight

•

Applied to Gradient Routing: Masking Gradients to Localize Computation in Neural Networks by TurnTrout 12d ago

•

Applied to Automated monitoring systems by hiki_t 20d ago

•

Applied to Ways to think about alignment by Abhimanyu Pallavi Sudhir 2mo ago

•

Applied to Reinforcement Learning from Information Bazaar Feedback, and other uses of information markets by Abhimanyu Pallavi Sudhir 3mo ago

•

Applied to AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization by DanielFilan 4mo ago

•

Applied to Inference-Only Debate Experiments Using Math Problems by Arjun Panickssery 4mo ago

•

Applied to Scalable oversight as a quantitative rather than qualitative problem by Ruby 5mo ago

•

Applied to On scalable oversight with weak LLMs judging strong LLMs by zac_kenton 5mo ago

•

Applied to NYU Code Debates Update/Postmortem by David Rein 7mo ago

•

Applied to Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight by Raemon 8mo ago

•

Created by Raemon at 8mo