loqi comments on Post Your Utility Function - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (273)
Human utility functions are relative, contextual, and include semi-independent positive-negative axes. You can't model all that crap with one number.
The study of affective synchrony shows that humans have simultaneously-active positive and negative affect systems. At extreme levels in either system, the other is shut down, but the rest of the time, they can support or oppose each other. (And in positions of opposition, we experience conflict and indecision.)
Meanwhile, the activation of these systems is influenced by current state/context/priming, as well as the envisioned future. So unless your attempt at modeling a utility function includes terms for all these things, you're sunk.
(Personally, this is where I think the idea of CEV has its biggest challenge: I know of no theoretical reason why humans must have convergent or consistent utility functions as individuals, let alone as a species.)
It's been a while since I looked at CEV, but I thought the "coherent" part was meant to account for this. It assumes we have some relatively widespread, fairly unambiguous preferences, which may be easier to see in the light of that tired old example, paperclipping the light cone. If CEV outputs a null utility function, that would seem to imply that human preferences are completely symmetrically distributed, which seems hard to believe.
If by "null utility function", you mean one that says, "don't DO anything", then do note that it would not require that we all have balanced preferences, depending on how you do the combination.
A global utility function that creates more pleasure for me by creating pain for you would probably not be very useful. Heck, a function that creates pleasure for me by creating pain for me might not be useful. Pain and pleasure are not readily subtractable from each other on real human hardware, and when one is required to subtract them by forces outside one's individual control, there is an additional disutility incurred.
These things being the case, a truly "Friendly" AI might well decide to limit itself to squashing unfriendly AIs and otherwise refusing to meddle in human affairs.
I wouldn't be particularly surprised by this outcome.