A voice for the unknown human who hides within the binary.
I’d like to offer some data points without much justification, I hope it might spur some thought/discussion without needing to be taken on faith:
Here is my take:
Value is a function of the entire state space, and can't be neatly decomposed as a sum of subgames.
Rather (dually), value on ("quotient") subgames must be confluent with the total value on the joint game.
Eg, there's an "enjoying the restaurant food" game, and a "making your spouse happy" game, but the joint game of "enjoying a restaurant with your spouse" game has more moves available, and more value terms that don't show up in either game, like "be a committed couple".
"Confluence" here means that what you need to forget to zoom in on the "enjoying the restaurant food" subgame causes your value judgement of "enjoying the restaurant food" and "enjoying a restaurant with your spouse, ignoring everything except food" to agree.
The individual subgames aren't "closed", they were never closed, their value only makes sense in a larger context, because the primitives used to define that value refer to the larger context. From the perspective of the larger game, no value is "destroyed", it only appears that way when projecting into the subgames, which were only ever virtual.
Thanks for making this map 🙏
I expect this is a rare moment of clarity because maintaining updates takes a lot of effort and is now subject to optimization pressure.
Also imo most of the "good" alignment work in terms of eventual impact is being done outside the alignment label (eg as differential geometry or control theory) and will be merged in later once the connection is recognized. Probably this will continue to become more true over time.