With the release of Rohin Shah and Eliezer Yudkowsky's conversation, the Late 2021 MIRI Conversations sequence is now complete.
This post is intended as a generalized comment section for discussing the whole sequence, now that it's finished. Feel free to:
- raise any topics that seem relevant
- signal-boost particular excerpts or comments that deserve more attention
- direct questions to participants
In particular, Eliezer Yudkowsky, Richard Ngo, Paul Christiano, Nate Soares, and Rohin Shah expressed active interest in receiving follow-up questions here. The Schelling time when they're likeliest to be answering questions is Wednesday March 2, though they may participate on other days too.
I wrote Consequentialism & Corrigibility shortly after and partly in response to the first (Ngo-Yudkowsky) discussion. If anyone has an argument or belief that the general architecture / approach I have in mind (see the “My corrigibility proposal sketch” section) is fundamentally doomed as a path to corrigibility and capability—as opposed to merely “reliant on solving lots of hard-but-not-necessarily-impossible open problems”—I'd be interested to hear it. Thanks in advance. :)
Am I correct after reading this that this post is heavily related to embedded agency? I may have misunderstood the general attitudes, but I thought of "future states" as "future to now" not "future to my action." It seems like you couldn't possibly create a thing that works on the last one, unless you intend it to set everything in motion and then terminate. In the embedded agency sequence, they point out that embedded agents don't have well defined i/o channels. One way is that "action" is not a well defined term, and is often not atomi... (read more)