Sophie Bridgers — LessWrong

LESSWRONG
LW

Replying toHuman-AI Complementarity: A Goal for Amplified Oversight

Human-AI Complementarity: A Goal for Amplified Oversight

Thanks for another thoughtful response and explaining further. I think we can now both agree that we disagree (at least in certain respects) ;-)

We take seriously your argument that AI could get really smart and good at predicting human preferences and values, which could change the level of human involvement in training, evaluation, and monitoring. However, if we go with the approach you propose:

> Instead, I think our strategy should be "If humans are inconsistent and disagree, let's strive to learn a notion of human values that's robust to our inconsistency and disagreement."

> A committee of humans reviewing an AI's proposal is, ultimately, a physical system that can be predicted. If you have... (read more)

Replying toHuman-AI Complementarity: A Goal for Amplified Oversight

Sophie Bridgers1y

Human-AI Complementarity: A Goal for Amplified Oversight

Hi Charlie, Thanks for your thoughtful feedback and comments! If we may, we think we actually agree more than we disagree. By “definitionally accurate”, we don’t necessarily mean that a group of randomly selected humans are better than AI at explicitly defining or articulating human values or better at translating those values into actions in any given situation. We might call this “empirical accuracy” – that is, under certain empirical conditions such as time pressure, expertise and background of the empirical sample, incentive structure of the empirical task, the dependent measure, etc. humans can be inaccurate about their underlying values and the implications of those values for real-world decisions. Rather by “definitional... (read 385 more words →)

Human-AI Complementarity: A Goal for Amplified Oversight

rishubjain

rishubjain, Sophie Bridgers

By Sophie Bridgers, Rishub Jain, Rory Greig, and Rohin Shah
For more details and full list of contributors, please see our paper: https://arxiv.org/abs/2510.26518

Human oversight is critical for ensuring that Artificial Intelligence (AI) models remain safe and aligned to human values. But AI systems are rapidly advancing in capabilities and are being used to complete ever more complex tasks, making it increasingly challenging for humans to verify AI outputs and provide high-quality feedback. How can we ensure that humans can continue to meaningfully evaluate AI performance? An avenue of research to tackle this problem is “Amplified Oversight” (also called “Scalable Oversight”), which aims to develop techniques to use AI to amplify humans’ abilities to... (read 269 more words →)