> Paul Christiano's incredibly complicated schemes have no chance of working in real life before DeepMind destroys the world.
> Eliezer in Death With Dignity
Eliciting Latent Knowledge reads to me as an incredibly narrow slice in reasoning space, a hyperbolically branching philosophical rabbit hole of caveats.
For example, this paragraph on page 7
translates as:
If you can ask whether AI is saying the truth and it answers "no" then you know it is lying.
But how is trusting this answer different from just trusting AI to not deceive you as a whole?
A hundred pages of an elaborate system with competing actors playing games of causal diagrams trying to solve for the worst case is exciting ✨ precisely because it... (read more)
Shameless plug of my uncharitable criticism that I believe has a similar vibe: ELK shaving