When I read the beginning of this post I asked myself, "if people don't have utility functions, why haven't LWers gotten rich by constructing Dutch books against people?"
I answered myself, "in practice, most people will probably ignore clever-looking bets because they'll suspect that they're being tricked. One way to avoid Dutch books is to avoid bets in general."
A model is not terribly useful if it does not do a better job of prediction than alternative models. (Micro)economics does quite a good job of predicting human behaviour based on a very simple model of predictable rationality. It is not clear to me that this model offers a better approach to making meaningful predictions about real world human behaviour. I've only skimmed the article but it appears the tests are limited to rather artificial lab tests. That's better than nothing but I'm skeptical that this model's real world predictive power justifies its c...
Research like this seems very hopeful to me. It breaks down into a nice component describing what people actually want and a lot of other components describing shifts of attention and noise. If anything, that seems too optimistic compared to, say, prospect theory, in which the basic units of motivation are shifts from a baseline and there's no objective baseline or obvious way to translate shift valuations into fixed-level valuations.
There's a gap between the general applicability of utility functions in theory, and their general inapplicability in practice. Indeed, there's a general gap between theory and practice.
I would argue that this gap is a reason to do FAI research in a practical way - writing code, building devices, performing experiments. Dismissing gritty practicality as "too risky" or "not relevant yet" (which is what I hear SIAI doing) seems to lead to becoming a group without experience and skill at executing practical tasks.
Disclaimer: I'm aware that ...
Are you questioning that we can model human behavior using a utility function (i.e. microeconomics) or that we can model human values using a utility function? Or both? The former is important if you're trying to predict what a human would do, the second is important if you're trying to figure out what humans should do - or what you want an AGI to do.
Utility functions are a good model to use if we're talking about designing an AI. We want an AI to be predictable, to have stable preferences, and do what we want.
Why would these desirable features be the result? It reads to me as if you're saying that this is a solution to the Friendly AI problem. Surely not?
There are many alternatives to expected utility if you want to model actual humans. For example, Kahneman and Tversky's prospect theory. The Wikipedia page for Expected utility hypothesis contains many useful links.
Question: do people think this post was too long? In the beginning, I thought that it would be a good idea to give a rough overview of DFT to give an idea of some of the ways by which pure utility functions could be made more reflective of actual human behavior. Near the end, though, I was starting to wonder if it would've been better to just sum it up in, say, three paragraphs.
What's the risk in using a more static view of utility or preference in computing CEV?
My initial thought: fine, some people will be less pleased at various points in the future than they would have been. But a single dominant FAI effectively determining our future is already a compromise from what people would most prefer.
Curiously, these drawbacks appear to have a common theme; they all concern, one way or another, temporal aspects of decision making.
Ainslie and Powers are certainly two who've taken up this question; Ainslie from the perspective of discounted prediction, and Powers from the perspective of correcting time-averaged perceptions.
I think both are required to fully understand human decisionmaking. Powers fills in the gap of Ainslie's vague notion of "appetites", while Ainslie fills in for the lack of any sort of foresight or prediction in Powers' ...
It seems simple to convert any computable agent-based input-transform-output model into a utility-based model - provided you are allowed utility functions with Turing complete languages.
Simply wrap the I/O of the non-utility model, and then assign the (possibly compound) action the agent will actually take in each timestep utility 1 and assign all other actions a utility 0 - and then take the highest utility action in each timestep.
That neatly converts almost any practical agent model into a utility-based model.
So: there is nothing "wrong" with utility-based models. A good job too - they are economics 101.
There's a lot of discussion on this site that seems to be assuming (implicitly or explicitly) that it's meaningful to talk about the utility functions of individual humans. I would like to question this assumption.
To clarify: I don't question that you couldn't, in principle, model a human's preferences by building this insanely complex utility function. But there's an infinite amount of methods by which you could model a human's preferences. The question is which model is the most useful, and which models have the least underlying assumptions that will lead your intuitions astray.
Utility functions are a good model to use if we're talking about designing an AI. We want an AI to be predictable, to have stable preferences, and do what we want. It is also a good tool for building agents that are immune to Dutch book tricks. Utility functions are a bad model for beings that do not resemble these criteria.
To quote Van Gelder (1995):
One model that attempts to capture actual human decision making better is called decision field theory. (I'm no expert on this theory, having encountered it two days ago, so I can't vouch for how good it actually is. Still, even if it's flawed, it's useful for getting us to think about human preferences in what seems to be a more realistic way.) Here's a brief summary of how it's constructed from traditional utility theory, based on Busemeyer & Townsend (1993). See the article for the mathematical details, closer justifications and different failures of classical rationality which the different stages explain.
Stage 1: Deterministic Subjective Expected Utility (SEU) theory. Basically classical utility theory. Suppose you can choose between two different alternatives, A and B. If you choose A, there is a payoff of 200 utilons with probability S1, and a payoff of -200 utilons with probability S2. If you choose B, the payoffs are -500 utilons with probability S1 and +500 utilons with probability S2. You'll choose A if the expected utility of A, S1 * 200 + S2 * -200 is higher than the expected utility of B, S1 * -500 + S2 * 500, and B otherwise.
Stage 2: Random SEU theory. In stage 1, we assumed that the probabilities S1 and S2 stay constant across many trials. Now, we assume that sometimes the decision maker might focus on S1, producing a preference for action A. On other trials, the decision maker might focus on S2, producing a preference for action B. According to random SEU theory, the attention weight for variable Si is a continous random variable, which can change from trial to trial because of attentional fluctuations. Thus, the SEU for each action is also a random variable, called the valence of an action. Deterministic SEU is a special case of random SEU, one where the trial-by-trial fluctuation of valence is zero.
Stage 3: Sequential SEU theory. In stage 2, we assumed that one's decision was based on just one sample of a valence difference on any trial. Now, we allow a sequence of one or more samples to be accumulated during the deliberation period of a trial. The attention of the decision maker shifts between different anticipated payoffs, accumulating weight to the different actions. Once the weight of one of the actions reaches some critical threshold, that action is chosen. Random SEU theory is a special case of sequential SEU theory, where the amount of trials is one.
Consider a scenario where you're trying to make a very difficult, but very important decisions. In that case, your inhibitory threshold for any of the actions is very high, so you spend a lot of time considering the different consequences of the decision before finally arriving to the (hopefully) correct decision. For less important decisions, your inhibitory threshold is much lower, so you pick one of the choices without giving it too much thought.
Stage 4: Random Walk SEU theory. In stage 3, we assumed that we begin to consider each decision from a neutral point, without any of the actions being the preferred one. Now, we allow prior knowledge or experiences to bias the initial state. The decision maker may recall previous preference states, that are influenced in the direction of the mean difference. Sequential SEU theory is a special case of random walk theory, where the initial bias is zero.
Under this model, decisions favoring the status quo tend to be chosen more frequently under a short time limit (low threshold), but a superior decision is more likely to be chosen as the threshold grows. Also, if previous outcomes have already biased decision A very strongly over B, then the mean time to choose A will be short while the mean time to choose B will be long.
Stage 5: Linear System SEU theory. In stage 4, we assumed that previous experiences all contribute equally. Now, we allow the impact of a valence difference to vary depending on whether it occurred early or late (a primacy or recency effect). Each previous experience is given a weight given by a growth-decay rate parameter. Random walk SEU theory is a special case of linear system SEU theory, where the growth-decay rate is set to zero.
Stage 6: Approach-Avoidance Theory. In stage 5, we assumed that, for example, the average amount of attention given to the payoff (+500) only depended on event S2. Now, we allow the average weight to be affected by a another variable, called the goal gradient. The basic idea is that the attractiveness of a reward or the aversiveness of a punishment is a decreasing function of distance from the point of commitment to an action. If there is little or no possibility of taking an action, its consequences are ignored; as the possibility of taking an action increases, the attention to its consequences increases as well. Linear system theory is a special case of approach-avoidance theory, where the goal gradient parameter is zero.
There are two different goal gradients, one for gains and rewards and one for losses or punishments. Empirical research suggests that the gradient for rewards tends to be flatter than that for punishments. One of the original features of approach-avoidance theory was the distinction between rewards versus punishments, closely corresponding to the distinction of positively versus negatively framed outcomes made by more recent decision theorists.
Stage 7: Decision Field Theory. In stage 6, we assumed that the time taken to process each sampling is the same. Now, we allow this to change by introducing into the theory a time unit h, representing the amount of time it takes to retrieve and process one pair of anticipated consequences before shifting attention to another pair of consequences. If h is allowed to approach zero in the limit, the preference state evolves in an approximately continous manner over time. Approach-avoidance is a spe... you get the picture.
Now, you could argue that all of the steps above are just artifacts of being a bounded agent without enough computational resources to calculate all the utilities precisely. And you'd be right. And maybe it's meaningful to talk about the "utility function of humanity" as the outcome that occurs when a CEV-like entity calculated what we'd decide if we could collapse Decision Field Theory back into Deterministic SEU Theory. Or maybe you just say that all of this is low-level mechanical stuff that gets included in the "probability of outcome" computation of classical decision theory. But which approach do you think gives us more useful conceptual tools in talking about modern-day humans?
You'll also note that even DFT (or at least the version of it summarized in a 1993 article) assumes that the payoffs themselves do not change over time. Attentional considerations might lead us to attach a low value to some outcome, but if we were to actually end up in that outcome, we'd always value it the same amount. This we know to be untrue. There's probably some even better way of looking at human decision making, one which I suspect might be very different from classical decision theory.
So be extra careful when you try to apply the concept of a utility function to human beings.