First, the background:

Humans obviously have a variety of heuristics and biases that lead to non-optimal behavior.  But can this behavior truly not be described by a function?

Well, the easiest way to show that utility isn't described by a function is to show the existence of cycles.  For example, if I prefer A to B, B to C, and C to A, that's a cycle - if all three are available I'll never choose, and if each switch is an increase in utility, my utility blows up to infinity!  Well, really, it simply becomes undefined.

Do we have real-world examples of cycles in utility comparisons?  Sure.  For a cycle of size 2, Eliezer cites the odd behavior of people with regard to money and probabilities of money.  However, the money-pumps he cites are rather inefficient.  Almost any decision that seems "arbitrary" to us can be translated into a cycle.  For example, anchoring means that people assign higher value to a toaster when a more expensive toaster is sitting next to it.  But most people, if asked, would certainly assign tiny value to adding/removing options they know they won't buy.  So we get the result that they must value the toaster more than they value the toaster.  The conjunction fallacy can be made into a cycle by the same reasoning if you ask people to bet on the thing happening together and then ask them to bet on the things happening separately.

So at the very least, not all humans have utility functions, which means that the human brain doesn't automatically give us a utility function to use - if we want one, we have to sculpt it ad-hoc out of intuitions using our general reasoning, and like most human things it probably won't be the best ever.

 

So, what practical implications does this have, aside from "people are weird?"

Well, I can think of two interesting things.  First, there are the implications for utilitarian ethics.  If utility functions are arbitrary not just on a person to person basis, but even within a single person, choosing between options using utilitarian ethics requires stronger, more universal moral arguments.  The introspective "I feel like X, therefore my utility function must include that" is now a weak argument, even to yourself!  The claim that "a utility function of universe-states exists" loses none of its consequences though, like alerting you that something is wrong when you encounter a cycle in your preferences, or of course supporting consequentialism.

 

Interesting thing two: application in AI design.  The first argument goes something like "well if it works for humans, why wouldn't it work for AIs?"  The first answer, of course, is "because an AI that had a loop it would get stuck in it."  But the first answer is sketchy, because *humans* don't go get stuck in their cycles.  We do interesting things like:

  •  meta level aversion to infinite loops.
  •  resolving equivalencies/cycles with "arbitrary" associations.
  •  not actually "looping" when we have a cycle, but doing something else that resembles utilitarianism much less.

So replacing a huge utility function with a huge-but-probably-much-smaller set of ad-hoc rules could actually work for AIs if we copy over the right things from human cognitive structure.  Would it be possible to make it Friendly or some equivalent?  Well, my first answer is "I don't see why not."  It seems about as possible as doing it for the utility-maximizing AIs.  I can think of a few avenues that, if profitable, would make it even simpler (the simplest being "include the Three Laws"), but it could plausibly be unsolvable as well.

The second argument goes something like "It may very well be better to do it with the list of guidelines than with a utility function."  For example, Eliezer makes the convincing argument that fun is not a destination, but a path.  What part of that makes sense from a utilitarian perspective?  It's a very human, very lots-of-rules way of understanding things.  So why not try to make an AI that can intuitively understand what it's like to have fun?  Hell, why not make an AI that can have fun for the same reason humans can?  Wouldn't that be more... well... fun?

This second argument may be full of it.  But it sounds good, eh?  Another reason the lots-of-rules approach may beat out the utility function approach is the ease of critical self-improvement.  A utility function approach is highly correlated with trying to idealize actions, which would make it tricky to write good code, which is a ridonkulously hard problem to optimize.  But a lots-of-rules approach intuitively seems like it could critically self-improve with greater ease - it seems like lots of rules will be needed to make an AI a good computer programmer anyhow, and under that assumption the lots-of-rules approach would be better prepared to deal with ad-hoc rules.  Is this assumption false?  Can you write a good programmer elegantly?  Hell if I know.  It just feel like if you could, we would have done it.

 

Basically, utility functions are guaranteed to have a large set of nice properties.  However, if humans are made of ad-hoc rules; if we want a nice property we just add the rule "have this property!"  This limits how well we can translate even our own desires into moral guidelines, especially near "strange" areas of our psychology such as comparison cycles.  But it also proves that there is a (possibly terrible) alternative to utility functions that intelligences can run on.  I've tried to make it plausible that the lots-of-rules approach could even be better than the utility-function approach, worth a bit of consideration before you start your next AI project.

 

Edit note: Removed apparently disliked sentence.  Corrected myself after totally forgetting Arrow's theorem.

Edit 2, not that this will ever be seen: This thought has obviously been thought before, but maybe not followed this far into hypothetical-land: http://lesswrong.com/lw/l0/adaptationexecuters_not_fitnessmaximizers/

New Comment
30 comments, sorted by Click to highlight new comments since:

If you'll pardon a digression, I got hung up on the first line of this post. Please note that the following has nothing to do with the rest of the content (i.e. I'm not saying whether this was or wasn't a good post otherwise).

Had to write this down somewhere.

This reads like you're making an excuse, or asking a pardon, for posting this, and I don't see why you would do so. If you think it's appropriate for LW discussion, post it with pride. If you don't, put it elsewhere. If you're not sure whether it's appropriate or not, it would be clearer to ask that explicitly so you could get an explicit response.

The reason it caught me is that you're stating one thing which is untrue (that you're compelled to write it) and implying another (that if you're going to write it, it must be here). These fit a mental pattern I'm very accustomed to but have mostly trained myself out of, after being quite successfully convinced that almost all language of compulsion ("should" "must" etc.) can be rephrased more precisely, and that doing so is beneficial towards staying aware of one's own agency and thus taking responsibility for one's decisions (e.g. "I can't go, I need to do homework" -> "I'm choosing to prioritize homework over going" or, as happens to be the case, "I should go to bed" -> "I'm choosing to poke around Less Wrong before I go to bed, and I'm willing to take responsibility for the consequences of that choice").

I hope this doesn't come off as me jumping down your throat for something trivial. It just seemed like an opportunity to be less wrong which you might not have noticed. Plus, it's good news. You're not being compelled! :)

[-][anonymous]90

One of the most useful (but hard-to-internalize) things I've learned is that it's good to think of your actions as choices. To frame things as "I'm deciding what I value most right now and I'm going to do that." as opposed to "I ought to do this, but I'm too weak to do as I ought." It's a radical difference in mindset.

Indeed. It makes me feel much more in control and much more capable.

It was simply an explanation for a post that turned out longer than I expected. I suppose I felt I should "excuse" that, except not anything so strong. Would you like it removed? Apparently 11 people agree with you, so... poof! Gone!

My use of "had," however, implies little to nothing about my agency if you interpret my sentences using your human sentence-interpreting abilities rather than relying on literalism. It's like the large numbers of languages that contain double negatives for emphasis, rather than as positives. E.g. "You ain't nothin'."

And of course, given determinism of any sort, my statement may even have been literally accurate :D

My use of "had," however, implies little to nothing about my agency if you interpret my sentences using your human sentence-interpreting abilities rather than relying on literalism.

Er? I disagree. I don't actually see what else it could mean. :) And the point wasn't to get you to remove it, just to suggest that you think about what it means. True enough re: determinism, but I don't actually think that matters. :P

How could it not matter? :)

Some other things it could mean (it meant some of these, but not all): I really wanted to; I knew that if I didn't write it I would likely regret it; I did it in order to maintain consistency with my previous actions and declarations; I felt the duty to do it in order to share my thoughts. And so on and so forth.

I agree insofar as it's good to examine thoughts, but examining habits of language quickly becomes silly, as illustrated by the double negative example. Do people who use double negatives for emphasis really think that -1*-1 = -2? Better to evaluate language as communication rather than "thought-stuff."

How could it not matter? :)

I don't find that assuming predestiny has any effect on my actions.

Do people who use double negatives for emphasis really think that -1*-1 = -2?

No, because in their language, the double negative doesn't imply that. In colloquial English, "I had to do x" does imply "I feel compelled to do x." It doesn't imply that some irresistable outside force really was pushing on you, just that you felt some sort of compulsion. And indeed, two of your examples suggest that you feel pressure to behave consistently and to share your thoughts.

Better to evaluate language as communication rather than "thought-stuff."

What an odd distinction. What is language for, if not communicating thoughts? I wasn't claiming that the words you chose said something other than what you clearly meant--but if I had been, wouldn't that be important? Language is the most precise tool you've got for getting thoughts outside your own head. Seems to me that it's worth ensuring that you use it as accurately as you can.

What I was actually claiming is that you were using a language pattern which I've found it beneficial to stop using. Using it doesn't mean that you really believe you don't have agency, but not using it might help you internalize that you do. The original comment wasn't a complaint that you were doing something wrong, just a suggestion that you might benefit from doing something differently.

I don't find that assuming predestiny has any effect on my actions.

"Assuming predestiny" is itself an action, for one thing. It totally messes up the moral weights we assign to actions, for another.

In colloquial English, "I had to do x" does imply "I feel compelled to do x."[...] And indeed, two of your examples suggest that you feel pressure to behave consistently and to share your thoughts.

Hm, perhaps I am drawing lines differently than you. You said in your first post that my use of "had" was specifically untrue. And yet it's fine to say that it's true if I felt even "some pressure" to do it. You probably noticed this oddness too, since you reiterate:

What I was actually claiming is that you were using a language pattern which I've found it beneficial to stop using.

Oh, okay. Well, to be frank: tough. I take perhaps a little too much responsibility already :D

Better to evaluate language as communication rather than "thought-stuff."

What an odd distinction. What is language for, if not communicating thoughts?

You just made the distinction too :) For example, some people claim that double negatives (including "ain't no") are "wrong." As if they were a false theorem, or a wrong thought. But when they are seen as a mode of communicating thoughts, it's clear that the only requirement is that it works.

A closely related argument is over whether or not using language truly changes the way you think. There are some subtle and interesting ways that it does, but in general, Orwell was wrong. (Here, have a link.)

Now, if you were claiming that using "had" was bad communication, I'd be more inclined to listen. In fact I already have implicitly listened, since I removed that sentence.

Speaking of communication, we seem to be interested in having different conversations from each other in this thread; let's not.

[-][anonymous]00

In general I've found that making your preferences more explicit leads to a more satisfactory outcome on reflection. But I'm not sure how I should structure my time so as to balance the benefits of preference clarification with the costs of not acting towards my current approximation of my preference. Right now I'm using the heuristic of periodically doing preference clarification, with the periods being longer between higher-order preference clarification.

This issue also seems related to the question of when to go meta. I tried to get a discussion about that going here.

I didn't follow this entirely, and I'd like to. Can you elaborate on why you need to balance the clarification with acting on the preference? I don't see why they're opposed.

[-][anonymous]00

Can you elaborate on why you need to balance the clarification with acting on the preference? I don't see why they're opposed.

I guess I was thinking they're opposed in the way that any two possible actions are opposed; at any given time one could be higher value than the other. But perhaps you're thinking that one can do them at the same time? In fact we might want to think of acting on preference as clarifying preference, in which case I don't really know what to do. I'm not sure if we should do that yet, since I'm unclear as to whether we're trying to discover preference or determine it.

Well, they're opposed in that each of them takes some of your finite time and they can't generally be combined, yes. But this is also true of sleeping and eating, and I don't have trouble finding time for both. My remarks about taking responsibility for preferences weren't really about when to choose to say them, but about phrasing them explicitly when I was going to say them anyway. I suppose you could call that discovering the preference, but I think of it more as observing--or, closer, as not letting myself get away with denying them.

[-]Jack70

For a more elegant demonstration I think we have to turn to politics, where, at some point in 2008, a majority of U.S. voters simultaneously preferred Clinton to Obama, Obama to McCain and McCain to Clinton. In the entire space of sets of candidates, it seems plausible that there's a large number I (insert "even me!" here) would have cyclic preferences over, if I didn't explicitly try to construct a function - which implies that if I did construct a function it would be "wrong."

You're talking here about Condorcet cycles and they aren't evidence people have irrational preferences.

I don't think this is quite right. The problem seems to be that the OP confusingly conflates the following:

  • Condorcet cycles, which are cyclic preferences on the part of the entire electorate -- a phenomenon which can sometimes emerge from the aggregated, transitive preferences of the individuals making up the electorate. As you note, that isn't evidence that anyone is irrational.

  • Cyclic preferences on the part of individuals, which do actually seem to be kind of irrational.

Oh, whoops, forgot you could construct those out of liinear rankings! Arrow's theorem or something, right?

Drat!

[-]Jack00

Right. Look at the part I quoted.

The part you quoted seems to be talking about two different things at once. There's a description of a Condorcet cycle ("a majority of U. S. voters simultaneously preferred ...") followed by a suggestion that most individuals have internal preferences that are cyclic in many cases ("it seems plausible that there's a large number ... I would have cyclic preferences over ...").

So, I think the post was confusing on this point -- and possibly either sloppy or confused on the distinction between those two things -- yet I think the OP intends to be talking about something other than Condorcet cycles, which are exclusively a group phenomenon. Insofar as the OP describes Condorcet cycles in the first part of the quoted passage, I think that's a mistake but not the intended message of the post.

[-]Jack20

Sigh. Now we have to go into the details :-) Yes. The OP intends to be talking about something other than Condorcet cycles which is why I quoted the part where he was talking about them and told him that 'here' i.e. in the passage I quoted, he was talking about Condorcet cycles and that they are not evidence for his thesis that people have irrational preferences.

The OP isn't conflating two things. I presume the poster didn't realize there was another explanation for cyclical group preferences other than individuals having cyclical group preferences. It doesn't mean the post is wrong overall, just that the piece of evidence I quoted isn't support.

Insofar as the OP describes Condorcet cycles in the first part of the quoted passage, I think that's a mistake but not the intended message of the post.

The quoted passage consists of describing a Condorcet cycle and concluding something about individual preferences from it. I was correcting this. I haven't said anything about the intended message of the post.

One way to construct a (sometimes irrational) agent is to assign the agent's decision making to a committee of perfectly rational agents - each with its own utility function.

Whether the decision making is done by a voting scheme or by casting lots to pick the current committee chairman, the decision making will be occasionally irrational, and hence not suitable to be described by a utility function.

However, if the agents' individual utility functions are common knowledge, then Nash bargaining may provide a way for the committee to combine their divergent preferences into a single, harmonious, rational composite utility function.

Hm, that's an interesting thought. Of course, we're not talking about a council of a few voices in your head here. Voting theory results, which totally slipped my mind when writing the post, tell you that the number of voters with linear rankings needed grows something like log(N), where N is the number options you want to rank arbitrarily.

Our N is huge, so we may actually be describing the same things - I'm just calling them "rules," and you're calling them "agents," essentially - although my thinking isn't using the voting framework, so the actual implications are slightly different.

This post is all wrong. You can, in fact, closely model the actions of any computable agent using a utility function.

This has been previously explained here - and elsewhere

Well, one way to weakly model the actions is to assign the trivial utility function - one which makes the agent indifferent among all outcomes of actions. Any set of actions would be consistent with this utility function.

If you want the actions to actually be uniquely determined by the utility function, then you can do it in the way you propose - by adding additional payoffs associated with actions, rather than outcomes. I think this is "cheating", in some sense, even though I recognize that in order to model a rational agent practicing a deontological form of ethics, you need to postulate payoffs attached to actions rather than outcomes.

However, short of using one of these two "tricks", it is not true that any agent can be modeled by a utility function. Only rational agents can be so modeled. "Rational" agents are characterized as Bayesians that adhere to the axioms of transitivity of preference and the "sure thing principle".

Utilities are necessarily associated with actions. That is the point of using them. Agents consider their possible actions and assign utilities to them in order to decide which one to take. It is surely not a "trick" - it is a totally standard practice.

Outcomes are not known at the time of the decision. At best, the agent has a fancy simulation - which is just a type of computer program. If some computer programs are forbidden, while others are allowed, then what is permitted and what is not should be laid down. I am happy not to have to face that unpleasant-looking task.

I'm sorry, Tim. I cannot even begin to take this seriously. Please consult any economics, game theory, or decision theory text. Chapter 1 of Myerson is just one of many possible sources.

You will learn that utilities are derived from preferences over outcomes, portfolios, or market baskets, and that from this data, one constructs expected utilities over actions so as to guide choice of actions.

I don't know whether you are trolling here, or sharing your own original research, or are simply confused by something you may have read about "revealed preferences". In any case, please provide a respectable reference if you wish to continue this discussion.

Uh, try here:

In economics, utility is a measure of relative satisfaction. Given this measure, one may speak meaningfully of increasing or decreasing utility, and thereby explain economic behavior in terms of attempts to increase one's utility. Utility is often modeled to be affected by consumption of various goods and services, possession of wealth and spending of leisure time.

That is what "utliity" means - and not what you mistakenly said. Consequently, if an agent has preferences for its own immediate actions, so be it - that is absolutely permitted.

I meant a respectable reference which supported your position. It sure seems to me that that quotation supported me, rather than you. So it is quite likely that we are not understanding each other. And given results so far, probably not worth it that we try.

My reference looks OK to me. It supports my position just fine - and you don't seem to disagree with it. However, here is another similar one:

Utility: Economist-speak for a good thing; a measure of satisfaction. (See also WELFARE.) Underlying most economic theory is the assumption that people do things because doing so gives them utility. People want as much utility as they can get.

It is true that some economic definitions of utility attach utility to "goods and services" - or to "humans" - e.g.:

utility - Economics - the capacity of a commodity or a service to satisfy some human want.

However such definitions are simply inadequate for use in decision theory applications.

Note that utility is NOT defined as being associated with states of the external world, things in the future - or anything like that. It is simply a measure of an agent's satisfaction.

If you really do think that satisfaction is tied to outcomes, then I think you could benefit from some more exposure to Buddhism - which teaches that satisfaction lies within. The hypothesis that it is solely a function of the state of the external world is just wrong.

You could also say that humans have utility functions, but they can change quickly over time because of trivial things. Which, I admit, would be near-indistinguishable from not having utility functions at all (in the long-term), but saying that you have a utility function and a set of preferences at one instant in time seems true enough to allow for decision theory analysis.

You could also say that humans have utility functions, but they can change quickly over time because of trivial things.

This seems like it would make the statement "humans have utility functions" devoid of value, as you point out. A single-valued utility function must be curlless (i.e. no cycles) to be a sensible utility function- but humans seem to demonstrably have cycles. It's not just a "grass is greener" phenomenon where, once I choose John McCain, I immediately regret it and wish I had picked Clinton (i.e. a quick change over time), but that the utility function is not single-valued- the reason why people prefer McCain to Clinton is different for the reason they prefer Clinton to Obama. If you asked someone "If you had to pick one candidate from all three, who would you pick?" they would be stumped- i.e. the decision theory analysis would hang.

Now, generally what people do in these situations is tease out the causes of the preferences, and try to weight them- but it's obviously possible to get another cycle going there.

(Also check out Perplexed's comment below.)