You know the drill - If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one.
3. Open Threads should start on Monday, and end on Sunday.
4. Open Threads should be posted in Discussion, and not Main.
Here's a comment that I posted in a discussion on Eliezer's FB wall a few days back but didn't receive much of a response there, maybe it'll prompt more discussion here:
--
So this reminds me, I've been thinking for a while that VNM utility might be a hopelessly flawed framework for thinking about human value, but I've had difficulties putting this intuition in words. I'm also pretty unfamiliar with the existing literature around VNM utility, so maybe there is already a standard answer to the problem that I've been thinking about. If so, I'd appreciate a pointer to it. But the theory described in the linked paper seems (based on a quick skim) like it's roughly in the same direction as my thoughts, so maybe there's something to them.
Here my stab at trying to describe what I've been thinking: VNM utility implicitly assumes an agent with "self-contained" preferences, and which is trying to maximize the satisfaction of those preferences. By self-contained, I mean that they are not a function of the environment, though they can and do take inputs from the environment. So an agent could certainly have a preference that made him e.g. want to acquire more money if he had less than $5000, and which made him indifferent to money if he had more than that. But this preference would be conceptualized as something internal to the agent, and essentially unchanging.
That doesn't seem to be how human preferences actually work. For example, suppose that John Doe is currently indifferent between whether to study in college A or college B, so he flips a coin to choose. Unbeknownst to him, if he goes to college A he'll end up doing things together with guy A until they fall in love and get monogamously married; if he goes to college B he'll end up doing things with gal B until they fall in love and get monogamously married. It doesn't seem sensible to ask which choice better satisfies his romantic preferences as they are at the time of the coin flip. Rather, the preference for either person develops as a result of their shared life-histories, and both are equally good in terms of intrinsic preference towards someone (though of course one of them could be better or worse at helping John achieve some other set of preferences).
More generally, rather than having stable goal-oriented preferences, it feels like we acquire different goals as a result of being in different environments: these goals may persist for an extended time, or be entirely transient and vanish as soon as we've left the environment.
As an another example, my preference for "what do I want to do with my life" feels like it has changed at least three times today alone: I started the morning with a fiction-writing inspiration that had carried over from the previous day, so I wished that I could spend my life being a fiction writer; then I read some e-mails on a mailing list devoted to educational games and was reminded of how neat such a career might be; and now this post made me think of how interesting and valuable all the FAI philosophy stuff is, and right now I feel like I'd want to just do that. I don't think that I have any stable preference with regard to this question: rather, I could be happy in any career path as long as there were enough influences in my environment that continued to push me towards that career.
It's as Brian Tomasik wrote at http://reducing-suffering.blogspot.fi/2010/04/salience-and-motivation.html :
If this is the case, then it feels like trying to maximize preference satisfaction is an incoherent idea in the first place. If I'm put in environment A, I will have one set of goals; if I'm put in environment B, I will have another set of goals. There might not be any way of constructing a coherent utility function so that we could compare the utility that we obtain from being put in environment A versus environment B, since our goals and preferences can be completely path- and environment-dependent. Extrapolated meta-preferences don't seem to solve this either, because there seems to be no reason to assume that they'd any less stable or self-contained.
I don't know what we could use in place of VNM utility, though. At it feels like the alternate formalism should include the agent's environment/life history in determining its preferences.
What I think is happening is that we're allowed to think of humans as having VNM utility functions ( see also my discussion with Stuart Armstrong ), but the utility function is not constant over time (since we're not introspective recursively modifying AIs that can keep their utility functions stable).