x

LESSWRONG

LW

Yoav Hollander — LessWrong

Yoav Hollander

Yoav Hollander

Message

46

2

9

3y

Yoav Hollander

46

3y

Coverage-driven alignment - What ‘Teaching Claude Why’ can borrow from AV verification

Cross-posted from The Foretellix CTO Blog. This is a full-text linkpost, following feedback that my previous piece was too brief as a stub. Summary: This post suggests that alignment training could benefit from coverage-driven verification. Anthropic recently reported that teaching Claude alignment rules (via pretraining-style next-token learning on alignment-related stories)...

The V&V method - A step towards safer AGI

Originally posted on my blog, 24 Jun 2025 - see also full PDF (26 pp) Abstract The V&V method is a concrete, practical framework which can complement several alignment approaches. Instead of asking a nascent AGI to “do X,” we instruct it to design and rigorously verify a bounded “machine-for-X”....

Jun 24, 2025•20