You might find AI Safety Endgame Stories helpful - I wrote it last week to try to answer this exact question, covering a broad array of (mostly non-pivotal-act) success stories from technical and non-technical interventions.
Nate's "how various plans miss the hard bits of the alignment challenge" might also be helpful as it communicates the "dynamics of doom" that success stories have to fight against.
One thing I would love is to have a categorization of safety stories by claims about the world. E.g what does successful intervention look like in worlds where one or more of the following claims hold:
These all seem like plausible worlds to me, and it would be great if we had more clarity about what worlds different interventions are optimizing for. Ideally we should have bets across all the plausible worlds in which intervention is tractable, and I think that's currently far from being true.
Thanks, I found your post very helpful and I think this community would benefit from posts similar as such.
I agree that we would need a clear categorization. Ideally, they would provide us a way to explicitly quantify/make-legible the claims of various proposals e.g. "my proposal, under these assumptions about the world, may give us X years of time, changes the world in these ways, and interacts with proposal A, B, C in these ways.
The lack of such is perhaps one of the reasons as to why I feel the pivotal act framing is still necessary. It seems to me that...
TL;DR, Are they any works similar to Wei_Dai's Ai Safety "Success Stories" that provide a framework to think about the landscape of possible success stories & pathways humanity will take to survive misaligned AI?
I've been trying to think of systematic ways of assessing non-technical proposals for improving the odds of humanity's survival from misaligned AI.
Aside from numerous frameworks for assessing technical alignment proposals, I haven't seen much resources on non-technical proposals that provide a concrete framework to think about the question: "What technological/geopolitical/societal pathway will our civilization most likely take (or should ideally take) in order to survive AI?"
Having such a framework seems pretty valuable since it would let us think about the exact alignment-pathway & context in which [proposals that want to help with alignment] would be effective at.
For example, a pretty clear dimension in which people's opinions differ is in the necessity of pivotal acts i.e. "pivotal act vs gradual steering" (kind of oversimplified)—here, any proposal's theory of impact will necessarily depend on their beliefs regarding (a) which position on the spectrum currently appears to be most likely by default, and (b) which position on the spectrum we should be aiming for.
I've seen a lot of similar frameworks for technical alignment proposals, but nothing much for pathways our civilization will actually take to survive (Wei_Dai's post is similar, but is mostly about the form that the AI will end up taking, without mentioning the pathways in which we'll arrive at that outcome).
Any resources I might be missing? (if there aren't any, I might write one)