This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
is fundraising!
Tags
LW
$
Login
Scalable Oversight
•
Applied to
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
by
TurnTrout
12d
ago
•
Applied to
Automated monitoring systems
by
hiki_t
20d
ago
•
Applied to
Ways to think about alignment
by
Abhimanyu Pallavi Sudhir
2mo
ago
•
Applied to
Reinforcement Learning from Information Bazaar Feedback, and other uses of information markets
by
Abhimanyu Pallavi Sudhir
3mo
ago
•
Applied to
AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
by
DanielFilan
4mo
ago
•
Applied to
Inference-Only Debate Experiments Using Math Problems
by
Arjun Panickssery
4mo
ago
•
Applied to
Scalable oversight as a quantitative rather than qualitative problem
by
Ruby
5mo
ago
•
Applied to
On scalable oversight with weak LLMs judging strong LLMs
by
zac_kenton
5mo
ago
•
Applied to
NYU Code Debates Update/Postmortem
by
David Rein
7mo
ago
•
Applied to
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
by
Raemon
8mo
ago
•
Created by
Raemon
at
8mo