I'll have to think through this post more carefully later, but, there's some recent work on approximate abstractions between causal models that I expect you'd be extremely interested by (if you aren't already aware) https://arxiv.org/abs/2207.08603?s=09
There are quite a few interesting dynamics in the space of possible values, that become extremely relevant in worlds where 'perfect inner alignment' is impossible/incoherent/unstable.
In those worlds, it's important to develop forms of weak alignment, where successive systems might not be unboundedly corrigible but do still have semi-cooperative interactions (and transitions of power).
Yeah, intertemporal trust and coordination become hugely important. Lots of 'scalable alignment' strategies are relevant, recursively delegating yourself tasks or summarizing your progress so far. An inhuman level of flexibility would also help, instantly grieving your old circumstances then adapting to the new ones.
Can you be confident that your past self knew what they were doing when they dropped you in this situation? Or that your future selves will develop things the way you expect them to? You could choose to deliberately and repeatedly lie to yourse...
Multiscale agency, self-misalignment, and ecological basins of attraction? This sounds really excellent and targets a lot of the conceptual holes I worry about in existing approaches. I look forward to the work that comes out of this!!
I was reminded of a couple different resources you may or may not already be aware of.
For 'vertical' game theory, check out Jules' Hedges work on open/compositional games. https://arxiv.org/search/cs?searchtype=author&query=Hedges%2C+J
For aggregative alignment, there's an interesting literature on the topology of social c...
I suspect you'd enjoy The Dawn Of Everything, an anarchist-tinged anthropological survey of the different nonlinear paths stateless societies and state formation have taken. Or, well, it discusses a wide range of related topics, with lots of creativity and decent enough rigor. I haven't finished yet.
I do agree that states can be seen as a game-theoretic trap, though. Once you have some centralized social violence or institutional monopoly on power, for a huge range of goals the easiest way to achieve them becomes "get the state/king/local bigwig on your si...
The claim that scissor statements are dangerous is itself a scissor statement: I think it's obviously false, and will fight you over it. Social interaction is not that brittle. It is important to notice the key ruptures between people's values/beliefs. Disagreements do matter, in ways that sometimes rightly prevent cooperation.
World population is ~2^33, so 33 independent scissor statements would set you frothing in total war of everyone against everyone. Except people are able to fluidly navigate much, much higher levels of difference and complexity than t...
I expect you already know this, but, the role of activists is not the same as the role of experts, and that's okay. You will never know everything relevant to the situation you're hoping to intervene in. Even if you did, institutions ignore their own environmental experts all the time. Usually, you aren't there as some sort of policy consultant, you're there to pressure their interests into alignment with yours. Even if you have zero clue what other constraints they are balancing, it can still be reasonable to loudly voice your problems; you are yourself o...
Update from almost 3 years in the future: this stream of work has continued developing in a few different directions. Both on the conceptual foundations, and some initial attempts to apply these tools to AI. Two recent works I was especially excited by (and their bibliographies): 'Towards a Grounded Theory of Causation for Embodied AI' (https://arxiv.org/abs/2206.13973, and here's an excellent talk by the author, https://youtu.be/5mZhcXhbciE), and 'Faithful, Interpretable Model Explanations via Causal Abstraction' (https://ai.stanford.edu/blog/causal-abstraction/).