This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Verification
•
Applied to
Compact Proofs of Model Performance via Mechanistic Interpretability
by
Jason Gross
5mo
ago
•
Applied to
Formal verification, heuristic explanations and surprise accounting
by
Mo Putera
5mo
ago
•
Applied to
Alignment with argument-networks and assessment-predictions
by
Tor Økland Barstad
2y
ago
•
Applied to
Making it harder for an AGI to "trick" us, with STVs
by
Tor Økland Barstad
2y
ago
•
Created by
Tor Økland Barstad
at
2y