kjmiller comments on Morality is not about willpower - Less Wrong

9 Post author: PhilGoetz 08 October 2011 01:33AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (144)

You are viewing a single comment's thread. Show more comments above.

Comment author: TimFreeman 09 October 2011 05:20:16AM *  5 points [-]

The Utility Theory folks showed that behavior of an agent can be captured by a numerical utility function iff the agent's preferences conform to certain axioms, and Allais and others have shown that human behavior emphatically does not.

A person's behavior can always be understood as optimizing a utility function, it just that if they are irrational (as in the Allais paradox) the utility functions start to look ridiculously complex. If all else fails, a utility function can be used that has a strong dependency on time in whatever way is required to match the observed behavior of the subject. "The subject had a strong preference for sneezing at 3:15:03pm October 8, 2011."

From the point of view of someone who wants to get FAI to work, the important question is, if the FAI does obey the axioms required by utility theory, and you don't obey those axioms for any simple utility function, are you better off if:

  • the FAI ascribes to you some mixture of possible complex utility functions and helps you to achieve that, or

  • the FAI uses a better explanation of your behavior, perhaps one of those alternative theories listed in the wikipedia article, and helps you to achieve some component of that explanation?

I don't understand the alternative theories well enough to know if the latter option even makes sense.

Comment author: kjmiller 09 October 2011 06:47:34PM *  0 points [-]

Seems to me we've got a gen-u-ine semantic misunderstanding on our hands here, Tim :)

My understanding of these ideas is mostly taken from reinforcement learning theory in AI (a la Sutton & Barto 1998). In general, an agent is determined by a policy pi that determines the probability that the agent will make a particular action in a particular state, P = pi(s,a). In the most general case, Pi can also depend on time, and is typically quite complicated, though usually not complex ;).
Any computable agent operating over any possible state and action space can be represented by some function pi, though typically folks in this field deal in Markov Decision Processes since they're computationally tractable. More on that in the book, or in a longer post if folks are interested. It seems to me that when you say "utility function", you're thinking of something a lot like pi. If I'm wrong about that, please let me know

When folks in the RL field talk about "utility functions", generally they've got something a little different in mind. Some agents, but not all of them, determine their actions entirely using a time-invariant scalar function U(s) over the state space. U takes in future states of the world and outputs the reward that the agent can expect to receive upon reaching that state (loosely "how much the agent likes s"). Since each action in general leads to a range of different future states with different probabilities, you can use U(s) to get an expected utility U'(a,s):

U'(a,s) = sum((p(s,a,s')*U(s')),

where s is the state you're in, a is the action you take, s' are the possible future states, and p is the probability than action a taken in state s will lead to state s'. Once your agent has a U', some simple decision rule over that is enough to determine the agent's policy. There are a bunch of cool things about agents that do this, one of which (not the most important) is that their behavior is much easier to predict. This is because behavior is determined entirely by U, a function over just the state space, whereas Pi is over the conjunction of state and action spaces. From a limited sample of behavior, you can get a good estimate of U(s), and use this to predict future behavior, including in regions of state and action space that you've never actually observed. If your agent doesn't use this cool U(s) scheme, the only general way to learn Pi is to actually watch the thing behave in every possible region of action and state space. This I think is why von Neumann was so interested in specifying exactly when an agent could and could not be treated as a utility-maximizer.

Hopefully that makes some sense, and doesn't just look like an incomprehensible jargon-filled snow job. If folks are interested in this stuff I can write a longer article about it that'll (hopefully) be a lot more clear.

Comment author: TimFreeman 10 October 2011 12:21:23AM *  1 point [-]

Some agents, but not all of them, determine their actions entirely using a time-invariant scalar function U(s) over the state space.

If we're talking about ascribing utility functions to humans, then the state space is the universe, right? (That is, the same universe the astronomers talk about.) In that case, the state space contains clocks, so there's no problem with having a time-dependent utility function, since the time is already present in the domain of the utility function.

Thus, I don't see the semantic misunderstanding -- human behavior is consistent with at least one utility function even in the formalism you have in mind.

(Maybe the state space is the part of the universe outside of the decision-making apparatus of the subject. No matter, that state space contains clocks too.)

The interesting question here for me is whether any of those alternatives to having a utility function mentioned in the Allais paradox Wikipedia article are actually useful if you're trying to help the subject get what they want. Can someone give me a clue how to raise the level of discourse enough so it's possible to talk about that, instead of wading through trivialities? PM'ing me would be fine if you have a suggestion here but don't want it to generate responses that will be more trivialities to wade through.