In the Pointers Problem, John Wentworth points out that human values are a function of latent variables in the world models of humans.
My question is about a specific kind of latent variable, one that's restricted to be downstream of what we can observe in terms of causality. Suppose we take a world model in the form of a causal network and we split variables into ones that are upstream of observable variables (in the sense that there's some directed path going from the variable to something we can observe) and ones that aren't. Say that the variables that have no causal impact on what is observed are "latent" for the purposes of this post. In other words, latent variables are in some sense "epiphenomenal". This definition of "latent" is more narrow but I think the distinction between causally relevant and causally irrelevant hidden variables is quite important, and I'll only be focusing on the latter for this question.
In principle, we can always unroll any latent variable model into a path-dependent model with no hidden variables. For example, if we have an (inverted) hidden Markov model with one observable and one hidden state (subscripts denote time), we can draw a causal graph like this for the "true model" of the world (not the human's world model, but the "correct model" which characterizes what happens to the variables the human can observe):
Here the are causally irrelevant latent variables - they have no impact on the state of the world but for some reason or another humans care about what they are. For example, if a sufficiently high capacity model renders "pain" a causally obsolete concept, then pain would qualify as a latent variable in the context of this model.
The latent variable at time depends directly on both and , so to accurately figure out the probability distribution of we need to know the whole trajectory of the world from the initial time: .
We can imagine, however, that even if human values depend on latent variables, these variables don't feed back into each other. In this case, how much we value some state of the world would just be a function of that state of the world itself - we'd only need to know to figure out what is. This naturally raises the question I ask in the title: empirically, what do we know about the role of path-dependence in human values?
I think path-dependence comes up often in how humans handle the problem of identity. For example, if it were possible to clone a person perfectly and then remove the original from existence through whatever means, even if the resulting states of the world were identical, humans who have different trajectories of how we got there in their mental model could evaluate questions of identity in the present differently. Whether I'm me or not depends on more information than my current physical state or even the world's current physical state.
This looks like it's important for purposes of alignment because there's a natural sense in which path-dependence is an undesirable property to have in your model of the world. If an AI doesn't have that as an internal concept, it could be simpler for it to learn a strategy of "trick the people who believe in path-dependence into thinking the history that got us here was good" rather than "actually try to optimize for whatever their values are".
With all that said, I'm interested in what other people think about this question. To what extent are human values path-dependent, and to what extent do you think they should be path-dependent? Both general thoughts & comments and concrete examples of situations where humans care about path-dependence are welcome.
I think you misunderstood my graph - the way I drew it was intentional, not a mistake. Probably I wasn't explicit enough about how I was splitting the variables and what I do is somewhat different from what johnswentworth does, so let me explain.
Some latent variables could have causal explanatory power, but I'm focusing on ones that don't seem to have any such power because they are the ones human values depend on most strongly. For example, anything to do with qualia is not going to have any causal arrows going from it to what we can observe, but nevertheless we make inferences about people's internal state of mind from what we externally observe of their behavior.
As for my questions about path-dependence, I think your responses don't address the question I meant to ask. For example,
This is not a property of path-dependence in the sense I'm talking about it, because for me anything that has causal explanatory power goes into the state xt. This would include whether there actually is an apple in your house or not, even if your current sensory inputs show no evidence of an apple.
EDIT: I notice now that there's a central question here about to what extent the latent variables human values are defined over are causally relevant vs causally irrelevant. I assumed that states of mind wouldn't be relevant but actually they could be causally relevant in the world model of the human even if they wouldn't be in the "true model", whatever that means.
I think in this case I still want to say that human values are path-dependent. This is because I care more about whether the values end up being path-dependent in the "true model" and not in the human's world model (which is imperfect), because a sufficiently powerful AGI would pick up the true model and then try to map its states to the latent variables that the human seems to care about. In other words, for it the latent variables could end up being causally irrelevant, even if for the human they aren't. I've edited the post to reflect this.