Miss Aligned AI — LessWrong

LESSWRONG
LW

Replying toRant on Problem Factorization for Alignment

Rant on Problem Factorization for Alignment

Shameless plug of my uncharitable criticism that I believe has a similar vibe: ELK shaving

> Paul Christiano's incredibly complicated schemes have no chance of working in real life before DeepMind destroys the world.
> Eliezer in Death With Dignity

Eliciting Latent Knowledge reads to me as an incredibly narrow slice in reasoning space, a hyperbolically branching philosophical rabbit hole of caveats.

For example, this paragraph on page 7translates as:

If you can ask whether AI is saying the truth and it answers "no" then you know it is lying.

But how is trusting this answer different from just trusting AI to not deceive you as a whole?

A hundred pages of an elaborate system with competing actors playing games of causal diagrams trying to solve for the worst case is exciting ✨ precisely because it... (read more)