After having chosen a utility function to maximize, how would it maximize? I'm thinking that the search/planning process for finding good policies naturally introduce mesa-optimizers, regardless of everything that came before in the PreDCA (detecting precursors and extrapolating their utility function).
It seems like the AI risk mitigation solutions you've listed aren't mutually exclusive, but we'll likely have to use a combination of them to succeed. While I agree that it would be ideal for us to end up with a FAS, the pathway towards the outcome would likely involve "sponge coordination" and "pivotal acts" as mechanisms by which our civilization can buy some time before FAS.
A possible scenario in a world where FAS takes some time (chronological):
It seems like the exact model which the AI will adopt is kinda confounding my picture when I'm trying to imagine how "existentially secure" a world looks like. I'm current thinking there are two possible existentially secure worlds:
The obvious one is where all human dependence is removed from setting/modifying the AI's value system (like CEV, fully value-aligned)—this would look much more unipolar.
The alternate is for the well-intentioned-and-coordianted group to use a corrigible AI that is aligned with its human instructor. To me, whether this scenario lo...
Is it even possible for a non-pivotal act to ever achieve existential security? Even if we max-ed up AI lab communication and had awesome interpretability, that doesn't help in the long-run given that the amount of minimum resources required to build a misaligned AGI will probably be keep dropping.
Thanks, I found your post very helpful and I think this community would benefit from posts similar as such.
I agree that we would need a clear categorization. Ideally, they would provide us a way to explicitly quantify/make-legible the claims of various proposals e.g. "my proposal, under these assumptions about the world, may give us X years of time, changes the world in these ways, and interacts with proposal A, B, C in these ways.
The lack of such is perhaps one of the reasons as to why I feel the pivotal act framing is still necessary. It seems to me that...
In my model this isn't a capabilities failure, because there are demons in imperfect search; what you would get out of a heuristic-search-to-approximate-the-best-policy wouldn't only be ... (read more)