It Can't Be Mesa-Optimizers All The Way Down (Or Else It Can't Be Long-Term Supercoherence?)
Epistemic status: After a couple hours of arguing with myself, this still feels potentially important, but my thoughts are pretty raw here. Hello LessWrong! I’m an undergraduate student studying at the University of Wisconsin-Madison, and part of the new Wisconsin AI Safety Initiative. This will be my first “idea” post...