Related:
- Contra "Strong Coherence"
- why assume AGIs will optimize for fixed goals
- Why The Focus on Expected Utility Maximisers?
Background and Core Concepts
I operationalised "strong coherence" as:
Informally: a system has immutable terminal goals.
Semi-formally: a system's decision making is well described as an approximation of argmax over actions (or higher level mappings thereof) to maximise the expected value of a single fixed utility function over states.
And contended that humans, animals (and learning based agents more generally?) seem to instead have values ("contextual influences on decision making").
The shard theory account of value formation in learning based agents is something like:
- Value shards are learned computational/cognitive heuristics causally downstream of similar historical reinforcement events
- Value shards activate more strongly in contexts similar to those where they were historically reinforced
And I think this hypothesis of how values form in intelligent systems could be generalised out of a RL context to arbitrary constructive optimisation processes[1]. The generalisation may be something like:
Decision making in intelligent systems is best described as "executing computations/cognition that historically correlated with higher performance on the objective functions a system was selected for performance on"[2].
This seems to be an importantly different type of decision making from expected utility maximisation[3]. For succinctness, I'd refer to systems of the above type as "systems with malleable values".
The Argument
In my earlier post I speculated that "strong coherence is anti-natural". To operationalise that speculation:
- Premise 1: The generalised account of value formation is broadly accurate
- At least intelligent systems in the real world form "contextually activated cognitive heuristics that influence decision making" as opposed to "immutable terminal goals"
- Humans can program algorithms with immutable terminal goals in simplified virtual environments, but we don't actually know how to construct sophisticated intelligent systems via design; we can only construct them as the product of search like optimisation processes[4]
- And intelligent systems constructed by search like optimisation processes form malleable values instead of immutable terminal goals
- I.e. real world intelligent systems form malleable values
- Premise 2: Systems with malleable values do not self modify to have immutable terminal goals
- Would you take a pill that would make you an expected utility maximiser[3]? I most emphatically would not.
- If you accept the complexity and fragility of value theses, then self modifying to become strongly coherent just destroys most of what the current you values.
- For systems with malleable values, becoming "strongly coherent" is grossly suboptimal by their current values
- A similar argument might extend to such systems constructing expected utility maximisers were they given the option to
- Would you take a pill that would make you an expected utility maximiser[3]? I most emphatically would not.
- Conclusion 1: Intelligent systems in the real world do not converge towards strong coherence
- Strong coherence is not the limit of effective agency in the real world
- Idealised agency does not look like "(immutable) terminal goals" or "expected utility maximisation"
- Conclusion 2: "strong coherence" does not naturally manifest in sophisticated real world intelligent systems
- Sophisticated intelligent systems in the real world are the product of search like optimisation processes
- Such optimisation processes do not produce intelligent systems that are strongly coherent
- And those systems do not converge towards becoming strongly coherent as they are subjected to more selection pressure/"scaled up"/or otherwise amplified
- ^
E.g:
* Stochastic gradient descent
* Natural selection/other evolutionary processes
- ^
- ^
Of a single fixed utility function over states.
- ^
E.g I'm under the impression that humans can't explicitly design an algorithm to achieve AlexNet accuracy on the ImageNet dataset.
I think the self supervised learning that underscores neocortical cognition is a much harder learning task.
I believe that learning is the only way there is to create capable intelligent systems that operate in the real world given our laws of physics.
Sorry, I guess I didn't make the connection to your post clear. I substantially agree with you that utility functions over agent-states aren't rich enough to model real behavior. (Except, maybe, at a very abstract level, a la predictive processing? (which I don't understand well enough to make the connection precise)).
Utility functions over world-states -- which is what I thought you meant by 'states' at first -- are in some sense richer, but I still think inadequate.
And I agree that utility functions over agent histories are too flexible.
I was sort of jumping off to a different way to look at value, which might have both some of the desirable coherence of the utility-function-over-states framing, but without its rigidity.
And this way is something like, viewing 'what you value' or 'what is good' as something abstract, something to be inferred, out of the many partial glimpses of it we have in the form of our extant values.