With the release of Rohin Shah and Eliezer Yudkowsky's conversation, the Late 2021 MIRI Conversations sequence is now complete.
This post is intended as a generalized comment section for discussing the whole sequence, now that it's finished. Feel free to:
- raise any topics that seem relevant
- signal-boost particular excerpts or comments that deserve more attention
- direct questions to participants
In particular, Eliezer Yudkowsky, Richard Ngo, Paul Christiano, Nate Soares, and Rohin Shah expressed active interest in receiving follow-up questions here. The Schelling time when they're likeliest to be answering questions is Wednesday March 2, though they may participate on other days too.
I'm a little confused what it hopes to accomplish. I mean, to start I'm a little confused by your example of "preferences not about future states" (i.e. 'the pizza shop employee is running around frantically, and I am laughing' is a future state).
But to me, I'm not sure what the mixing of "paperclips" vs "humans remain in control" accomplishes. On the one hand, I think if you can specify "humans remain in control" safely, you've solved the alignment problem already. On another, I wouldn't want that to seize the future: There are potentially much better futures where humans are not in control, but still alive/free/whatever. (e.g. the Sophotechs in the Golden Oecumene are very much in control). On a third, I would definitely, a lot, very much, prefer a 3 star 'paperclips' and 5 star 'humans in control' to a 5 star 'paperclips' and a 3 star 'humans in control', even though both would average 4 stars?