army1987 comments on We Don't Have a Utility Function - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (123)
You do have a utility function (though it may be stochastic). You just don't know what it is. "Utility function" means the same thing as "decision function"; it just has different connotations. Something determines how you act; that something is your utility function, even if it can be described only as a physics problem plus random numbers generated by your free will and adjustments made by God. (God must be encapsulated in an oracle function.) We call it a utility function to clue people into our purposes and the literature that we're going to draw on for our analysis. If we wished to regard a thing as deterministic rather than as an agent with free will, we would call its decision function a probability density function instead of a utility function.
If you truly have terminal values, they are mainly described by a large matrix of synaptic connections and weights.
When you say "I don't have a utility function" or "I don't have terminal values", you are mostly complaining that approximations are only approximations. You are thinking about some approximation of your utility function or your terminal values, expressed in language or logic, using symbols that conveniently but inaccurately cluster all possible sense-experience vectors into categories, and logical operations that throw away all information but the symbols (and perhaps some statistics, such as a probability or typicality for each symbol).
When we use the words "utility function", the level of abstraction to use to describe it, and hence its accuracy, depends on the purpose we have in mind. What's incoherent is talking about "my utility function" absent any such purpose. It's just like asking "What is the length of the coast of England?"
Whether you have terminal values is a more-complicated question, for uninteresting reasons such as quantum mechanical considerations. The short answer is probably, Any level of abstraction that is simple enough for you to think about, is too simple to capture values that are guaranteed not to change.
Underneath both these questions is the tricky question, "Which me is me?" Are you asking about the utility function enacted by the set of SNPs in your DNA, by your body, or by your conscious mind? These are not the same utility functions. (Whether your conscious mind has a utility function is a tricky question because we would have to separate actions controlled by your conscious mind from actions your body takes not controlled by your conscious mind. If consciousness is epiphenomenal, your mind does not have a useful utility function.)
One common use of terminal values on LW is to try to divine a set of terminal values for humans that can be used to guide an AI. So a specific, meaningful, useful question would be, "Can I discover and describe my terminal values in enough detail that I can be confident that an AI, controlled by these values, will enact the coherent extrapolated volition of these values?" ("Coherent extrapolated volition" may be meaningless, but that's a separate issue.) I believe the answer is no, which is one reason why I don't support MIRI's efforts toward FAI.
Eliezer spent a lot of time years ago explaining in detail why giving an AI goals like "Make humans happy" is problematic, and began to search for the appropriate level of description of goals/values. He unfortunately didn't pursue this to its conclusion, and chose to focus on errors caused by drift from the original utility function, or by logics that fail to achieve rationality, to the exclusion of consideration of changes caused by the inevitable inexactness of a representation of a utility function and the random component of the original utility function, or of the tricky ontological questions that crop up when you ask, "Whose utility function?"
What? Not having terminal values means that either you don't care about anything at all, or that “the recursive chain of valuableness” is infinitely deep. Neither of these seems likely to me.
I think there's a third possibility: values have a circular, strange-loop structure.