Human-AI Complementarity: A Goal for Amplified Oversight
By Sophie Bridgers, Rishub Jain, Rory Greig, and Rohin Shah For more details and full list of contributors, please see our paper: https://arxiv.org/abs/2510.26518 Human oversight is critical for ensuring that Artificial Intelligence (AI) models remain safe and aligned to human values. But AI systems are rapidly advancing in capabilities and...
Thanks for another thoughtful response and explaining further. I think we can now both agree that we disagree (at least in certain respects) ;-)
We take seriously your argument that AI could get really smart and good at predicting human preferences and values, which could change the level of human involvement in training, evaluation, and monitoring. However, if we go with the approach you propose:
> Instead, I think our strategy should be "If humans are inconsistent and disagree, let's strive to learn a notion of human values that's robust to our inconsistency and disagreement."
> A committee of humans reviewing an AI's proposal is, ultimately, a physical system that can be predicted. If you have... (read more)