A Crash Course in the Neuroscience of Human Motivation

lukeprog

[PDF of this article updated Aug. 23, 2011]

Whenever I write a new article for Less Wrong, I'm pulled in two opposite directions.

One force pulls me toward writing short, exciting posts with lots of brain candy and just one main point. Eliezer has done that kind of thing very well many times: see Making Beliefs Pay Rent, Hindsight Devalues Science, Probability is in the Mind, Taboo Your Words, Mind Projection Fallacy, Guessing the Teacher's Password, Hold Off on Proposing Solutions, Applause Lights, Dissolving the Question, and many more.

Another force pulls me toward writing long, factually dense posts that fill in as many of the pieces of a particular argument in one fell swoop as possible. This is largely because I want to write about the cutting edge of human knowledge but I keep realizing that the inferential gap is larger than I had anticipated, and I want to fill in that inferential gap quickly so I can get to the cutting edge.

For example, I had to draw on dozens of Eliezer's posts just to say I was heading toward my metaethics sequence. I've also published 21 new posts (many of them quite long and heavily researched) written specifically because I need to refer to them in my metaethics sequence.¹ I tried to make these posts interesting and useful on their own, but my primary motivation for writing them was that I need them for my metaethics sequence.

And now I've written only four posts² in my metaethics sequence and already the inferential gap to my next post in that sequence is huge again. :(

So I'd like to try an experiment. I won't do it often, but I want to try it at least once. Instead of writing 20 more short posts between now and the next post in my metaethics sequence, I'll attempt to fill in a big chunk of the inferential gap to my next metaethics post in one fell swoop by writing a long tutorial post (a la Eliezer's tutorials on Bayes' Theorem and technical explanation).³

So if you're not up for a 20-page tutorial on human motivation, this post isn't for you, but I hope you're glad I bothered to write it for the sake of others. If you are in the mood for a 20-page tutorial on human motivation, please proceed.

Who knows what I want to do? Who knows what anyone wants to do? How can you be sure about something like that? Isn’t it all a question of brain chemistry, signals going back and forth, electrical energy in the cortex? How do you know whether something is really what you want to do or just some kind of nerve impulse in the brain. Some minor little activity takes place somewhere in this unimportant place in one of the brain hemispheres and suddenly I want to go to Montana or I don’t want to go to Montana.

- Don DeLillo, White Noise

Preface

How do we value things, and choose between options? Philosophers, economists, and psychologists have long tried to answer these questions. But human behavior continues to defy our most subtle models of it, and the algorithms producing our behavior remained hidden in a black box.

But now, neuroscientists are directly measuring the neurons whose firing rates encode value and produce our choices. We know a lot more about the neuroscience of human motivation than you might think. Now we can peer directly into the black box of human motivation, and begin (dimly) to read our own source code.

The neuroscience of human motivation has implications for philosophy of mind and action, for scientific self-help, and for metaethics and Friendly AI. (We don't really know what we want, and looking directly at the algorithms that produce human wanting might help in solving this mystery.)

So, I wrote a crash course in the neuroscience of human motivation.

The purpose of this document is not to argue for any of the conclusions presented within it. That would require not a long blog post but instead a couple 500-page books — say, Foundations of Neuroeconomic Analysis and Handbook of Reward and Decision Making (my two greatest sources for this post).⁴

Instead, I merely want to summarize the current mainstream scientific picture on the neuroscience of human motivation, explain some of the concepts it uses, and tell a few stories about how our current picture of human motivation developed.

As you read this, I hope that many questions and objections will come to mind, because it's not the full story. That's why I went to the trouble of linking to PDFs of almost all my sources (see References): so you can check the full data and the full arguments yourself if you like.

This document is long. You may prefer to read it in sections.

Folk Psychology

There are these things called 'humans' on planet Earth. They undergo metabolism and cell growth. They produce waste. They maintain homeostasis. They reproduce. They move. They communicate. Sometimes they have pillow fights.

Some of these human processes are 'automatic', like cell growth and breathing. Other processes are 'intentional' or 'willed', like moving and communicating and having pillow fights. We call these latter processes intentional actions, or simply actions. Sometimes we're not sure where to draw the line between automatic processes and actions, but this should become clearer as we learn more. In the meantime, we ask...

How can we explain human actions?

One popular explanation is 'folk psychology.' Folk psychology posits that we humans have beliefs and desires, and that we are motivated to do what we believe will fulfill our desires.

I desire to eat a cookie. I believe I can fulfill that desire if I walk to the kitchen and put one of the cookies there into my mouth. So I am motivated to walk to the kitchen and put a cookie in my mouth.

Of course there are complications. For example I have multiple desires. Suppose I desire to eat a cookie and believe there are cookies in the kitchen. But I also desire to remain sitting comfortably in the living room. Can I satisfy both desires? I also believe that if I nicely ask my friend in the kitchen to bring me a cookie, she will. So I ask her to bring me a cookie and I begin to eat it, without having to leave the comfy living room sofa. We still explain my behavior with constructs like 'beliefs' and 'desires', but we consider more than one of each to do so.

Most of us use folk psychology every day to successfully predict human behavior. I believe that my friend desires to do nice things for me on occasion if they're not too much trouble, and I believe that my friend, once I tell her I want a cookie, will believe she can be nice to me without much trouble if she brings me a cookie from the kitchen. So, I predict that my friend will bring me a cookie when I ask her. So I ask her, and behold! My prediction was correct. I am happily eating a cookie on the sofa.

But folk psychology (FP) faces some problems.⁵ Consider its context in history:

The presumed domain of FP used to be much larger than it is now. In primitive cultures, the behavior of most of the elements of nature were understood in intentional terms. The wind could know anger, the moon jealousy, the river generosity… These were not metaphors… the animistic approach to nature has dominated our history, and it is only in the last two or three thousand years that we have restricted FP’s literal application to the domain of the higher animals.

[Even still,] the FP of the Greeks is essentially the FP we uses today… This is a very long period of stagnation and infertility for any theory to display, especially when faced with such an enormous backlog of anomalies and mysteries in its own explanatory domain… To use Imre Lakatos’ terms, FP is a stagnant or degenerating research program, and has been for millennia.

Consider also its prospects for inter-theoretic reduction:

If we approach homo sapiens from the perspective of natural history and the physical sciences, we can tell a coherent story of its constitution, development, and behavioral capacities which encompasses particle physics, atomic and molecular theory, organic chemistry, evolutionary theory, biology, physiology, and materialistic neuroscience. The story, though still radically incomplete, is already extremely powerful, outperforming FP at many points even in its own domain. And it is deliberately… coherent with the rest of our developing world picture. In short, the greatest theoretical synthesis in [history] is currently in our hands…

But FP is no part of this growing synthesis. Its intentional categories stand magnificently alone, without visible prospect of reduction to that larger corpus. A successful reduction cannot be ruled out, in my view, but FP’s explanatory impotence and long stagnation inspire little faith that its categories will find themselves neatly reflected in the framework of neuroscience. On the contrary, one is reminded of how alchemy must have looked as elemental chemistry was taking form, how Aristotelean cosmology must have looked as classical mechanics was being articulated, or how the vitalist conception of life must have looked as organic chemistry marched forward.

Finally, consider the problem of habit. I sit at my computer and want to type my name, 'Luke.' However, I have just used a special program to switch the function of the keys labeled L and P so that they will input the other character instead (so that I can play a prank on my friend, who will be using my computer shortly). I believe that typing the key labeled L will input P instead, but nevertheless when I type my name my fingers fall into their familiar habit and I end up typing my name as 'Puke.' My act of typing was intentional, and yet I didn't do what I believed would fulfill my desire to type my name.

Folk psychology faces both successes and failures in explaining human action. Hopefully we can do better.

Neoclassical Economics

Folk psychology was updated and quantified by neoclassical economics. To summarize:

One [assumption of] neoclassical economics is "rationality," in which individuals are said to choose alternatives that maximize expected utilities. In particular, the neoclassical view is that individuals rank all possible alternatives according to how much satisfaction they will bring and then choose the alternative that [they expect] will bring the most satisfaction or utility...⁶

Let's review this notion of maximizing expected utility. Suppose I can choose one of two boxes sitting before me, red and blue. There is a 10% chance the red box contains a million dollars, and a 90% chance it contains nothing. As for the blue box, I am certain it contains $10,000. The 'expected value' of choosing the red box is (0.1 × $1,000,000) + (0.9 × $0), which is equal to $100,000. The expected value of choosing the blue box is !1 × $10,000), or $10,000. An agent that chose whatever had the highest expected value would choose the red box, which has 10 times the expected value of the blue box ($100,000 vs. $10,000).

But humans don't value things only according to their dollar value. A million dollars might have 10 times the objective value of $100,000, but it might have less than 10 times the subjective value of $100,000 because after $100,000 you only care a little how much more wealthy you are.

Or, you might be risk averse. You might prefer a sure thing to something that is uncertain. So a 10% chance of a million dollars might be worth less — in subjective value — than a 100% chance of $10,000. If you are risk averse you might choose the blue box because it has higher expected subjective value even though it has lower expected objective value.

We call objective value simply 'value'. We call subjective value 'utility.'

Neoclassical economics quantifies folk psychology by measuring the strength of belief with probability and by measuring the strength of desire with utility. It then says that humans act so as to maximize expected utility, a measure that combines the utility of particular thing with your subjective probability of getting it.⁷

This neoclassical model of human behavior has faced many challenges, and is regularly revised in the face of new evidence.⁸ For example, Loewenstein (1987) found that if students were asked to place a value on the opportunity to kiss a celebrity of their choice 1-5 days in the future, they placed the highest value on a kiss in 3 days. This didn't fit any existing neoclassical models of utility, but was explained in 2001 when Caplin & Leahy (2001) incorporated "anticipatory feeling" into the neoclassical model, explaining that the students got some utility from anticipating the kiss with the celebrity (but also, as usual, discounted the utility of a reward the further away it was in the future), and this is why they didn't want the kiss right away.

Keep in mind that economists don't argue that we actually compute the expected utility of each option before us and then choose the best one, but that we always act "as if" we were doing that.⁹

But sometimes we don't even act "as if" we are obeying the axioms of neoclassical economics. For example, the independence axiom of expected utility theory says that if you prefer an apple over an orange, then you must prefer the Gamble A (72% chance you get an apple, otherwise you get a cat) over the Gamble B (72% chance you get an orange, otherwise you get a cat). But Allais (1953) found that subjects do violate this basic assumption under some conditions.

Such violations of the basic axioms of neoclassical economics led to the development of behavioral economics and theories like Kahneman and Tversky's (1979) prospect theory,¹⁰ which transcends some assumptions of the neoclassical model. But these new theories don't fit the data perfectly, either.¹¹

The models of human motivation we've surveyed so far are conceptually related to decision theory (beliefs and desires, or probabilities and utilities), so I'll call them 'decision-theoretic models' of human motivation. We'll discuss decision-theoretic models again when we finally get to the topic of neuroscience, but for now I want to discuss a different approach to motivation.

Behaviorism and Reinforcement Learning

While neoclassical economists formulated expected utility theory, behaviorist psychologists developed a different set of explanations for human action. Though behaviorists were wrong when they said that science can't talk about mental activity or mental states, you can charitably think of behaviorists as playing a game of Rationalist's Taboo with constructs of folk psychology like "want" or "fear" in order to get at phenomena more appropriate for quantification in technical explanation. Also, the behaviorist approach led to 'reinforcement learning', an important concept in the neuroscience of human motivation.

Before I explain reinforcement learning, let's recall operant conditioning:

Stick a pigeon in a box with a lever and some associated machinery (a "Skinner box"). The pigeon wanders around, does various things, and eventually hits the lever. Delicious sugar water squirts out. The pigeon continues wandering about and eventually hits the lever again. Another squirt of delicious sugar water. Eventually it percolates into its tiny pigeon brain that maybe pushing this lever makes sugar water squirt out. It starts pushing the lever more and more, each push continuing to convince it that yes, this is a good idea.

Consider a second, less lucky pigeon. It, too, wanders about in a box and eventually finds a lever. It pushes the lever and gets an electric shock. Eh, maybe it was a fluke. It pushes the lever again and gets another electric shock. It starts thinking "Maybe I should stop pressing that lever." The pigeon continues wandering about the box doing anything and everything other than pushing the shock lever.

The basic concept of operant conditioning is that an animal will repeat behaviors that give it reward, but avoid behaviors that give it punishment.

Behaviorism died in the wake of cognitive psychology, but its approach to motivation turned out to be very useful in the field of artificial intelligence, where it is called reinforcement learning:

Reinforcement learning is learning what to do — how to map situations to actions — so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them. In the most interesting and challenging cases, actions may affect not only the immediate reward, but also the next situation and, through that, all subsequent rewards. These two characteristics — trial-and-error search and delayed reward — are the two most important distinguishing features of reinforcement learning.

To obtain a lot of reward, a reinforcement learning agent must prefer actions that it has tried in the past and found to be effective in producing reward. But to discover such actions it has to try actions that it has not selected before. The agent has to exploit what it already knows in order to obtain reward, but it also has to explore in order to make better action selections in the future. The dilemma is that neither exploitation nor exploration can be pursued exclusively without failing at the task. The agent must try a variety of actions and progressively favor those that appear to be best..¹²

In addition to the agent and its environment, there are four major components of a reinforcement learning system:

...a policy, a reward function, a value function, and, optionally, a model of the environment.

A policy defines the learning agent's way of behaving at a given time. Roughly speaking, a policy is a mapping from perceived states of the environment to actions to be taken when in those states...

A reward function defines the goal in a reinforcement learning problem. Roughly speaking, it maps perceived states (or state-action pairs) of the environment to a single number, a reward, indicating the intrinsic desirability of the state. A reinforcement-learning agent's sole objective is to maximize the total reward it receives in the long run. ...[A reward function may] be used as a basis for changing the policy. For example, if an action selected by the policy is followed by low reward, then the policy may be changed to select some other action in that situation in the future...

Whereas a reward function indicates what is good in an immediate sense, a value function specifies what is good in the long run. Roughly speaking, the value of a state is the total amount of reward an agent can expect to accumulate over the future starting from that state. Whereas rewards determine the immediate, intrinsic desirability of environmental states, values indicate the long-term desirability of states after taking into account the states that are likely to follow, and the rewards available in those states. For example, a state might always yield a low immediate reward, but still have a high value because it is regularly followed by other states that yield high rewards. Or the reverse could be true...

Rewards are in a sense primary, whereas values, as predictions of rewards, are secondary. Without rewards there could be no values, and the only purpose of estimating values is to achieve more reward. Nevertheless, it is values with which we are most concerned when making and evaluating decisions. Action choices are made on the basis of value judgments. We seek actions that bring about states of highest value, not highest reward, because these actions obtain for us the greatest amount of reward over the long run...

...The fourth and final element of some reinforcement learning systems is a model of the environment. This is something that mimics the behavior of the environment. For example, given a state and action, the model might predict the resultant next state and next reward. Models are used for planning, by which we mean any way of deciding on a course of action by considering possible future situations before they are actually experienced.

Want an example? Here is how a reinforcement learning agent would learn to play Tic-Tac-Toe:

First we set up a table of numbers, one for each possible state of the game. Each number will be the latest estimate of the probability of our winning from that state. We treat this estimate as the state's current value, and the whole table is the learned value function. State A has higher value than state B, or is considered 'better' than state B, if the current estimate of the probability of our winning from A is higher than it is from B. Assuming we always play Xs, then for all states with three Xs in a row the probability of winning is 1, because we have already won. Similarly, for all states with three Øs in a row... the correct probability is 0, as we cannot win from them. We set the initial values of all the other states, the nonterminals, to 0.5, representing an informed guess that we have a 50% chance of winning.

Now we play many games against the opponent. To select our moves we examine the states that would result from each of our possible moves (one for each blank space on the board) and look up their current values in the table. Most of the time we move greedily, selecting the move that leads to the state with greatest value, that is, with the highest estimated probability of winning. Occasionally, however, we select randomly from one of the other moves instead; these are called exploratory moves because they cause us to experience states that we might otherwise never see.

A sequence of Tic-Tac-Toe moves might look like this:¹³

Solid lines are the moves our reinforcement learning agent made, and dotted lines are moves it considered but did not make. The second move was an exploratory move: it was taken even though another sibling move, that leading to e*, was ranked higher.

While playing, the agent changes the values assigned to the states it finds itself in. To improve its estimates concerning the probability of winning from various states, it 'backs up' the value of state after each greedy move to the state before the move (as suggested by the arrows.) What this means is that the value of the earlier state is adjusted to be closer to the value of the later state.

If we let s denote the state before the greedy move, and s' the state after, then the update to the estimated value of s, denoted V(s), can be written:

V(s) <- V(s) + α[V(s') - V(s)]

where α is a small positive fraction called the step-size parameter, which influences the rate of learning. The update rule is an example of a temporal difference learning method, so called because its changes are based on a difference... between [value] estimates at two different times.

...if the step-size parameter is reduced properly over time, this method converges, for any [unchanging] opponent, to the true probabilities of winning from each state give optimal play by the agent.

And that's how a simple version of temporal difference (TD) reinforcement learning works.

Reinforcement Learning and Decision Theory

You may have noticed a key advantage of reinforcement learning: an agent using it can be 'dumber' than a decision-theoretic agent. It can just start with guesses ("What the hell; let's try 50%!") for the value of various states, and then it learns their true values by running through many, many trials.

But what if you don't have many trials to run through, and you need to make an important decision right now?

Then you have to be smart. You need to have a good model of the world and use decision theory to choose the action with the highest expected utility.

This is precisely what rationality — being good at building correct models of the world — is especially good for:

For some tasks, the world provides rich, inexpensive empirical feedback. In these tasks you hardly need reasoning. Just try the task many ways... and take care to notice what is and isn’t giving you results.

Thus, if you want to learn to sculpt, [studying rationality] is a bad way to go about it. Better to find some clay and a hands-on sculpting course. The situation is similar for small talk, cooking, selling, programming, and many other useful skills.

Unfortunately, most of us also have goals for which we can obtain no such ready success/failure data. For example, if you want to know whether cryonics is a good buy, you can’t just try buying it and not-buying it and see which works better. If you miss your first bet, you’re out for good.

Reinforcement learning can be a good strategy if you have time to learn from many trials. If you've only got one shot at a problem, you'd better build up a really accurate model of the world first and then try to maximize expected utility.

Now, back to our story.

It turns out that reinforcement learning seems to underlie many of our mental processes. (More on this later.)

The lesson Yvain drew from this discovery was:

Reinforcement learning is evolution writ small; behaviors propagate or die out based on their consequences to reinforcement in a mind, just as mutations propagate or die out based on their consequences to reproduction in an organism. In the behaviorist model, our mind is not an agent, but a flourishing ecosystem of behaviors both physical and mental, all scrabbling for supremacy and mutating into more effective versions of themselves.

Just as evolving organisms are adaptation-executors and not fitness-maximizers, so minds are behavior-executors and not utility-maximizers.

But things are a bit more complicated than that, as we'll now see.

The Turn to the Brain

I hesitate to say that men will ever have the means of measuring directly the feelings of the human heart. It is from the quantitative effects of the feelings that we must estimate their comparative amounts.

William Jevons (1871)

It turns out that Jevons was wrong. Modern neuroscience allows us to peer into the black box of the human value system and measure directly "the feelings of the human heart."¹⁴

We'll begin with the experiments of Wolfram Shultz. Schultz recorded the activity of single dopamine neurons in monkeys who sat in front of a water spout. At irregular intervals, a speaker played a tone and a drop of water dropped from the spout.¹⁵ The monkeys' dopamine neurons normally fired at the baseline rate, but responded with a burst of activity when water was delivered. Over time, though, the neurons responded less and less to the water and more and more to the tone.

But if Schultz delivered water without first giving the tone, then the dopamine neurons responded with a burst of activity again. And if he played the tone and didn't provide water, the neurons reduced their firing rates below the baseline. The neurons weren't responding to the water itself but to a difference between expected reward and actual reward — a reward prediction error (RPE).

Two other researchers, Read Montague and Peter Dayan, noticed that these patterns of neuronal activity were exactly predicted by TD reinforcement learning theory from computer science.¹⁶ In particular, the RPE observed in neurons appeared to play the same role in monkey learning as the difference between value estimates at two different times did in TD reinforcement learning theory.

Since then, researchers have done many more single-neuron recording studies to test particular versions of TD reinforcement learning and revise the theory until it predicts more and more behavior while also predicting novel experimental discoveries.

Caplin & Dean¹⁷ provided another way to test the hypothesis that dopamine neurons encoded RPE in a TD-class model. They showed that all existing RPE-models could be reduced to three axiomatic statements. If a system violated one of these axioms, it could not be an RPE system. Later, Caplin et al. (2010) tested the axioms on actual brain activity to see if they held up. They did. This is another reason why so many scientists working in this field believe the current 'dopamine hypothesis' — that dopamine neurons encode RPE in a TD-class reinforcement learning system in the brain.

TD-class reinforcement learning works in computers by updating numbers that represent the values of states. How does reinforcement learning work when using nerve cells?

Hebbian Learning

By Hebbian learning, of course. "Cells that fire together, wire together."

Imagine a neural pathway (in one of Pavlov's dogs) that connects the neural circuits that sense the ringing of a bell to the neural circuits for salivation. This is a weak connection at first, which is why the bell doesn't initially elicit salivation.

Also imagine a third neuron that connects the salivation circuit to a circuit that detects food. This is a strong connection, and that's why food does elicit salivation right away:¹⁸

Donald Hebb proposed:

When an axon of cell A is near enough to excite cell B and repeatedly or persistently take part in firing it, a growth process of metabolic change takes place in one or both cells such that A's efficacy, as one of the cells firing B, is increased.¹⁹

In short, whenever two connected cells are active at the same time, the synapses connecting them are strengthened.

Consider Pavlov's experiment. At first, the Bell cell will fire whenever bells ring, but probably not when the salivation cells happen to be active. So, the connection between the Bell cell and the Salivation cell remains weak. But then, Pavlov intervenes and causes the Bell cell and the Salivation cell to fire at the same time by ringing the bell and presenting food at the same time (the Food detector cell already has a strong connection to the Salivation cell). Whenever the Bell cell and the Salivation cell happen to fire at the same time, the synapse between them is strengthened. Once the connection is strong enough, the Bell cell can cause the Salivation cell to fire on its own, just like the Food detector cell can.

It was a fine theory, but it wasn't observed until Bliss & Lomo (1973) observed Hebb's mechanism at work in the rabbit hippocampus. Today, we know how some forms of Hebb's mechanism work at the molecular level.²⁰

Later, Wickens (1993) proposed a similar mechanism called the three-factor rule, according to which some synapses are strengthened whenever presynaptic and postsynaptic activity occurred in the presence of dopamine. These same synapses might be weakened when activity occurred in the absence of dopamine. Later studies confirmed this hypothesis.²¹

Suppose a monkey receives an unexpected reward and encodes a large positive RPE. Glimcher explains:

The TD model tells us that under these conditions we want to increment the value attributed to all actions or sensations that have just occurred. Under these conditions, we know that the dopamine neurons release dopamine throughout the frontocortical-basal ganglia loops, and do so in a highly homogenous manner. That means we can think of any neuron equipped with dopamine receptors as 'primed' for synaptic strengthening. When this happens, any segment of the frontocortical-basal ganglia loop that is already active will have its synapses strengthened.²²

We will return to the dopamine system later, but for now let us back up and pursue the neoclassical economic path into the brain.

Expected Utility in Neurons

Ever since Friedman (1953), economists have insisted that humans only behave as if they are utility maximizers, not that they actually compute expected utility and try to maximize it.

It was a surprise, then, when neuroscientists stumbled upon the neurons that were encoding expected utility in their firing rates.

Tanji & Evarts (1976) did their experiments with rhesus monkeys because they are our closest relative besides the apes, and this kind of work is usually forbidden on apes for ethical reasons (we need to implant a recording electrode in the brain).

The monkeys were trained to know that a colored light on the screen meant they would soon be offered a reward (a drop of water) either for pushing or pulling, but not for both. This was the ‘ready’ cue. A second later, researchers gave a ‘direction’ cue that told the monkeys which action — push or pull — was going to be rewarded. The third cue was the 'go' signal: if the monkey made the previously indicated movement, it was rewarded.

This is what they saw:

At the ‘ready’ cue, the neurons associated with a pushing motion became weakly active (but fired above the baseline rate), and so did the neurons associated with a pulling motion. When the ‘direction’ cue was given, the neurons associated with the to-be-rewarded motion doubled their firing rate, and the neurons associated with the opposite motion fell back to the baseline rate. Then at the ‘go’ cue, the neurons associated with the to-be-rewarded movement increased again rapidly, up past the threshhold required to produce movement, and the movement was produced shortly thereafter.

One tempting explanation of the data is that after the ‘ready’ cue, the monkey’s brain 'decides' there’s a 50% chance that pulling will get the reward, and a 50% chance that pushing will get the reward. That’s why we see the neuron firing rates associated with those two actions each jump to slightly less than 50% of the movement threshold when the ‘ready’ cue is given. But then, when the ‘direction’ cue is given, those expectations shift to 100%/0% or 0%/100%, depending on which action is about to be rewarded according to the ‘direction’ cue. That’s why activity in the circuit associated with the to-be-rewarded action doubles and the other one drops to baseline. And then the ‘go’ cue is delivered and firing rates blast past the movement threshold, and movement is produced.

Let's jump ahead to Basso & Wurtz (1997), who did a similar experiment except that they used voluntary eye movements (called ‘saccades’) instead of voluntary arm movements. And this time, they presented each monkey with one, two, four, or eight possible targets, instead of just two targets (push and pull) like Tanji & Evarts did.

What they found was that as more potential targets were presented, the magnitude of the preparatory activity associated with each target systematically decreased. And again, once the ‘direction’ and ‘go’ cues were presented, the activity associated with those other potential targets dropped rapidly and activity burst rapidly in neurons associated with the to-be-rewarded movement. It was as though the monkeys’ brains were distributing their probability mass evenly across the potentially rewarded actions, and then once they knew which action should in fact be rewarded, they moved all their probability mass to that action and performed the action and got the reward.

Real-Time Expected Utility Updates

Other researchers showed monkeys a black screen with flickering white dots on it. In each frame of the video, the computer moved each dot in a random direction. The independent variable was a measure called 'coherence.' In a 100% leftward coherence condition, all dots moved to the left. In a 60% rightward condition, 60% of the dots move rightward while the rest moved randomly. And so on.

In a typical experiment, the researchers would identify a neuron in a monkey's brain that increased its firing rate in response to rightward coherence of the dots, and decreased its firing rate in response to leftward coherent of the dots. Then they would present the monkey with a sequence (in random order) of every possible leftward and rightward coherence condition.

A leftward coherence (of any magnitude) meant the monkey would be rewarded for leftward eye movement, and a rightward coherence meant the monkey would be rewarded for rightward eye movement. But, the monkey had to wait two seconds before being rewarded.

In this experiment, the probabilities always started at 50% but then updated continuously. A 100% rightward coherence condition allowed the monkey to very quickly know which voluntary eye movement would be rewarded, but in a 5% rightward coherence condition the expected utility of the rightward target grew more slowly.

The results? The greater the coherence of rightward motion of the dots, the faster the neurons associated with rightward eye movement increased their firing rate. (A higher coherence meant the monkey was able to update its probabilities more quickly.)

Argmax and Reservation Price

Many studies show that the brain controls movement by way of a 'winner take all' mechanism that is isomorphic to the argmax operation from economics.²³ That is, there are many possibilities competing for your final choice, but just before your choice the single strongest signal remains after all the others are inhibited.

This choice mechanism was investigated in more detail by Michael Shadlen and others.²⁴ Shadlen gave monkeys the same eye movement task as above, except that the monkeys could make their choice at any time instead of waiting for two seconds. He found that:

When the direction of the dots is unambiguous, monkeys make their choices quickly.
As the direction of the dots becomes more ambiguous, they take longer to make their choices.
Throughout the experiment, the firing rates of neurons in the LIP (part of the 'final common path' for generating eye movement) grew toward a specific threshold level.

The threshold level acts as a kind of criterion of choice. Once the criterion is met, action is taken. Or in economic terms, the monkeys seemed to set a reservation price on making certain movements.²⁵

Random Utility

When deciding between goods of different expected utilities, humans exhibit a stochastic transfer function:

Consider a human subject choosing between two objects of highly different expected utilities, such as a first lottery with a 50% chance of winning $5 and a second lottery with a 25% chance of winning $5. We observe highly deterministic behavior under these conditions: basically all subjects always choose the 5o% chance of winning $5. But what happens when we increment the value of the 25% lottery? As the amount one stands to win from that lottery is incremented, individual subjects eventually switch their preference. Exactly when they make that switch depends on their idiosyncratic degree of risk aversion. What is most interesting about this behavior for these purposes, though, is that actual human subjects, when presented with this kind of choice repeatedly, are never completely deterministic. As the value of the 25% lottery increases, they begin to show probabilistic behavior — selecting the 25% lottery sometimes, but not always.²⁶

Our behavior has an element of randomness in it. Daniel McFadden won a Nobel Prize in economics for capturing such behavior using a random utility model.²⁷ The way he did it is to suppose that when a chooser asks himself what a thing is worth, he doesn't get a fixed answer but a variable one. That is, there is actual variation in his preferences. Thus, his expected utility for a particular lottery is drawn from a distribution of possible utilities, usually one with a Gaussian variance.²⁸

This behavior makes sense when we think about the human choice mechanism at the neuronal level, because neuron firing rates are stochastic.²⁹ When a neurobiologist says "The neuron was firing at 200 Hz," what she means is that the mean firing rate of the neuron over a long time and stable conditions would have been close to 200 Hz. So the neurons that encode utility (wherever they are) will exhibit stochasticity, and thereby introduce some randomness into our choices. In this way, neurobiological data constrains our economic models of human behavior. An economic model without some randomness in it will have difficulty capturing human choices for as long as humans run on neurons.³⁰

Discounting

Louie & Glimcher (2010) examined temporal discounting in the brain. The two monkeys in this study were repeatedly asked to choose between a small, immediately available reward and a larger reward available after a small delay. For example, on one day they were asked to choose between 0.13 millileters of juice right now, or else 0.2 millileters of juice available after a delay of 2, 4, 8, or 12 seconds. A monkey might be willing to wait 2, 4, or 8 seconds for the larger reward, but not 12 seconds.

After many, many measurements of this kind, Louie and Glimcher were able to describe the discounting function being used by each monkey. (One of them was more impatient than the other.)

Moreover, the neurons in the relevant section of the brain fired at rates that reflected each monkey’s discounting function. If 0.2 millileters of juice was offered with no delay, the neurons were highly active. If the same reward was offered at a delay of 2 seconds, they were slightly less active. If the same reward was offered after 4 seconds, the neurons were less active still. And so on. As it turned out, the discounting function that captured their choices was identical to the discounting function that captured the firing rates of these neurons.

This shouldn't be a surprise at this point, but just to confirm: Yes, we can observe discounting in the firing rates of neurons involved in the choice-making process.

Relative and Absolute Utility

Dorris & Glimcher (2004) observed monkeys and their choice mechanism neurons while the monkeys engaged in repeated plays of the inspection game. The study is too involved for me to explain here, but the results suggested that choice mechanism neurons encode relative expected utilities (relative to other actions under consideration) rather than absolute expected utilities.

Tobler et al. (2005) suggested that the brain only encodes relative expected utilities. But there is reason to suspect this can't be right. If we stored only relative expected utilities, then we would routinely violate the axiom of transitivity (if you prefer A to B and B to C, you can't also prefer C to A). To see why this is the case, consider Glimcher's example (he says 'expected value' instead of 'utility'):

...consider a subject trained to choose between objects A and B, where A is $1,000,000 worth of goods and B is $100,000 worth of goods... A system that represented only the relative expected subjective value of A and B would represent SV(A) > SV(B). Next, consider training the same subject to choose between C and D, where C is $1,000 worth of goods and D is $100 worth of goods. Such a system would represent SV(C) > SV(D). What happens when we ask a chooser to select between B and C? For a chooser who represents only relative expected subjective value, the choice should be C: she should pick $1,000 worth of goods over $100,000 worth of goods because it has a higher learned relative expected subjective value. In order for our chooser to... construct transitive preferences across choice sets (and to obey the continuity axiom)... it is required that somewhere in the brain she represent the absolute subjective values of her choices.³¹

And we mostly do seem to obey the axiom of transitivity.

So if the choice mechanism neurons do represent relative utilities, then some other neurons elsewhere must encode a more absolute form of utility. Other implications of this are explored in the next section.

Normalization

David Heeger showed³² that the firing rates of 'feature detector' neurons in the visual cortex captured a response to a feature in the visual field divided by the sum of the activation rates of nearby neurons sensitive to the same image. Thus, these neurons encode not only whether they 'see' the feature they are built to detect, but also how unique it is in the visual field.

The effect of this is that neurons reacting to the edge of a visual object fire more actively than others do. Behold! Edge detection!

It's also an efficient way to encode information about the world. Consider a world where orange dots are ubiquitous. For an animal in that world, it would be wasteful to fire action potentials to represent orange dots. Better to represent the absence of orange dots, or the transition from orange dots to something else. An optimally efficient encoding method would be sensitive not to the 'alphabet' of all possible inputs, but to a smaller alphabet of the inputs that actually appear in the world. This insight was mathematically formalized by Schwartz & Simoncelli (2001).

The efficiency of this normalization technique may explain why we've discovered it at work in so many different places in the brain.³³ And given that we've found it almost everywhere we've looked for it, it wouldn't be a surprise to see it show up in our choice-making circuits. Indeed, Simoncelli & Schwartz's normalization equation may be what our brains use to encode expected utilities that are relative to the other choices under consideration.

One implication of their equation is that a chooser's errors become more frequent as the size of the choice set grows. Thus, behavioral errors on small choice sets should be rarer than might be predicted by most random utility models, but error rates will increase rapidly with choice set size (and beyond a certain choice set size, choices will appear random).

Preliminary evidence that choice set size effects error rates has arrived from behavioral economics. For example, consider Iyengar & Lepper's (2000) study of supermarket shoppers. They set up a table showing either 6 or 24 flavors of jams, allowing shoppers to sample as many as they wanted. Customers who saw 24 flavors had a 3% chance of buying a jar, while those who saw only 6 flavors had a 30% chance!

In another experiment, Iyengar & Lepper let subjects choose one of either 6 or 30 different chocolates. Those who chose from among only 6 options were more satisfied with their selection than those who had been presented with 30 different chocolates.

These data fit our expectation that as the choice set grows, the frequency of errors in our behavior rises and the likelihood that an option will rise above the threshold for purchase drops. When Louie & Glimcher (2010) investigated this phenomena in monkey choice mechanism neurons, they found it at work there, too. But the process of choice-set editing is still poorly understood, and some recent studies have failed to replicate Iyengar & Lepper's results (Scheibehenne et al. 2010).

Perhaps the most surprising implication of these findings is that because of neuronal stochasticity, and because errors increase as the choice set grows, we should expect stochastic violations of the independence axiom, and that when choosers face very large choice sets they will essentially ignore the independence axiom.

This is a prediction about human behavior not made by earlier models from neoclassical economics, but it is suggested by looking at the neurons involved in human choice-making.

Are Actions Choices?

But all these data come from experiments where the choices are actions, and from our knowledge of the brain's "final common path" for producing actions. How do actions map on to choices about lovers and smartphones?

Studies by Greg Horowitz have provided some relevant data, because monkeys had to choose options identified by color rather than by action.³⁴ For example in one trial, a 'red' option might offer one reward and a 'green' option might offer a different reward. On each trial, the red and green options would appear at random places on the computer screen, and the monkey could choose a reward with a voluntary eye movement. The key here is that rewards were chosen by color and not by a (particular) action.

Horowitz found that the choice mechanism neurons showed the same pattern of activation under these conditions as was the case under action-based choice tasks.

So, it looks like the valuation circuits can store the value of a colored target, and these valuations can be mapped to the choice mechanism. But we don't know much about how this works, yet.

The Primate Choice Mechanism: A Brief Review

Thus far, we have mostly discussed the primate brain's choice mechanism. To review:

The choice circuit resides in the final common pathway for action.
It takes as its input a signal that encodes stochastic expected utility, a concept aligned to the random utility term in economic models proposed by McFadden (2005) and Gul & Pesendorfer (2006).
This input signal is represented by a normalized firing rate (with Poisson variance, like all neurons).
As the choice set size grows, so does the error rate.
Final choice is implemented by an argmax function or a reservation price mechanism. (A single circuit can achieve both modes.³⁵)

But how are probability and utility calculated such that they can be fed into the expected utility representations of the choice mechanism? I won't discuss how the brain forms probabilistic beliefs in this article,³⁶ so let us turn to the study of how utility is calculated in the brain: the question of valuation.

Marginal Utility and Reference Dependence

Consider the following story:

Imagine an animal exploring a novel environment from a nest on a day when both (1) its blood concentration is dilute (and thus its need for water is low) and (2) its blood sugar level is low (and thus its need for food is high). The animal travels west one kilometer from the nest and emerges from the undergrowth into an open clearing at the shores of a large lake. Not very thirsty, the animal bends down to sample the water and finds it... unpalatable... the next day the same animal leaves its nest in the same metabolic state and travels one kilometer to the east, where it discovers a grove of trees that yield a dry but nutritious fruit, a grove of dried apricot trees. It samples the fruit and finds it sweet and highly palatable.

What has the animal actually learned about the value of going west and the value of going east? It has had a weakly negative experience, in the psychological sense, when going west and a very positive experience when going east. Do these subjective properties of its experience influence what it has learned? Do the stored representations derived from these experiences encode the actual objective values of going west and east, or do they encode the subjective experiences? That is a critical question about what the animal has learned, because it determines what it does when it wakes up thirsty. When it wakes up thirsty it should, in a normative sense, go west towards the... lake, despite the fact that its previous visit west was a negative experience.³⁷

Economists have known this problem for a long time, and solved it with an idea called marginal utility.

In neoclassical economics, we view the animal as having two kinds of 'wealth': a sugar wealth and a water wealth (the total store of sugar and water in the animal's body at a given time). A piece of fruit or a sip of water is an increment in the animal's total sugar or water wealth. The utility of a piece of fruit or a sip of water, then, depends on its current levels of sugar and water wealth.

On day one, the animal's need for water is low and its need for sugar is high. On that day, the marginal utility of a piece of fruit is greater than the marginal utility of a sip of water. But suppose during the next week the animal has a high blood sugar level. At that time, the marginal utility of a piece of fruit is low. Thus, the marginal utility of a consumable resource depends on wealth. The wealthier the chooser, the lower the marginal utility provided by a fixed amount of gain ('diminishing marginal utility').

In neoclassical economics, the animal faced with the option of going east or west in the morning would first estimate how much the water and the fruit would change its objective wealth level, and then it would estimate how much those objective changes in wealth would change its utility. That is, it would use objective values to compute its marginal (subjective) utility. If it only had access to the subjective experiences in our story, it couldn't compute a new marginal utilities when it finds itself unexpectedly thirsty.

The problem with this solution is that the brain does not appear to encode the objective values of stimuli, and humans behaviorally don't seem to respect the objective values of options either, as discussed here.

In response to the behavioral evidence, Kahneman & Tversky (1979) developed a reference dependent utility function to describe human behavior: prospect theory. Their suggestion was, basically:

Rather than computing marginal utilities against [objective] wealth as in [standard neoclassical economic models], utilities (not marginal utilities) could be computed directly as deviations from a baseline level of wealth, and then choices could be based on direct comparisons of these utilities rather than on comparisons of marginal utilities. Their idea was to begin with something like the chooser's status quo, how much wealth he thinks he has. Each gamble is then represented as the chance of winning or losing utilities relative to that status-quo-like reference point.³⁸

This fits with the neurobiological fact that we encode signals from external stimuli relative to reference points, and don't have access to the objective values of stimuli.

The advantage of the neoclassical economic model is that it keeps a chooser's choices consistent. The advantage of the reference-dependent approach is that it better fits human behavior and human neurobiology.

Most neoclassical economists seem to ignore the problems for their theories that are presented by reference dependence in human behavior and human neurobiology, but two neoclassical economists at Berkeley, Matthew Rabin and Botond Koszegi, have begun to take reference dependence seriously. As they put it:

...while an unexpected monetary windfall in the lab may be assessed as a gain, a salary of $5o,000 to an employee who expected $60,000 will not be assessed as a large gain relative to status-quo wealth, but rather as a loss relative to expectations of wealth. And in nondurable consumption — where there is no object with which the person can be endowed — a status-quo-based theory cannot capture the role of reference dependence at all: it would predict, for instance, that a person who misses a concert she expected to attend would feel no differently than somebody who never expected to see the concert.³⁹

Their reference-dependent model makes particular predictions:

[Our theory] shows that a consumer's willingness to pay a given price for shoes depends on the probability with which she expected to buy them and the price she expected to pay. On the one hand, an increase in the likelihood of buying increases a consumer's sense of loss of shoes if she does not buy, creating an "attachment effect" that increases her willingness to pay. Hence, the greater the likelihood she thought prices would be low enough to induce purchase, the greater is her willingness to buy at higher prices. On the other hand, holding the probability of getting the shoes fixed, a decrease in the price a consumer expected to pay makes paying a higher price feel like more of a loss, creating a "comparison effect" that lowers her willingness to pay the high price. Hence, the lower the prices she expected among those prices that induce purchase, the lower is her willingness to buy at higher prices.

Thus, the cost of accepting the human fact of reference-dependence is that we have to admit that humans are irrational (in the sense of 'rationality' defined by the axioms of revealed preference):

The fact that a consumer will pay more for shoes she expected to buy than for shoes she did not expect to buy, or that an animal would prefer inferior fruit it expected to eat over superior fruit it did not expect to eat, is exactly the kind of irrational behavior that we might hope the pressures of evolution would preclude. What observations tell us, however, is that these behaviors do occur. The neuroscience of sensory encoding tells us that these behaviors are an inescapable product of the fundamental structure of our brains.⁴⁰

But really, shouldn't it have been obvious all along that humans are irrational? Perhaps it is, to everyone but neoclassical economists and Aristoteleans. (Okay, enough teasing...)

One thing to keep in mind is that the brain encodes information about the external world in a reference-dependent way because that method makes a more efficient use of neurons. So evolution traded away some rationality for greater efficiency in the encoding mechanism.

Valuation in the Brain

Back to dopamine. Earlier, we learned that the brain learns the values of their actions with a dopaminergic reward system that uses something like temporal difference (TD) reinforcement learning. This reward system updates the stored values for actions by generating a reward prediction error (RPE) from the difference between expected reward and experience reward, and propagating this learning throughout relevant structures of the brain using the neurotransmitter dopamine. In particular, some synapses are strengthened whenever presynaptic and postsynaptic activity occur in the presence of dopamine, as proposed by Wickens (1993).

But we haven't yet discussed how utilities for actions are generated in the first place, or how they are stored (independent of the expected utilities represented during the choice process). It feels like I generally want ice cream a little bit and hot sex a lot more. Where is that information stored?

Dozens⁴¹ of fMRI studies show that two brain regions in particular are correlated with subjective value: the ventral striatum and the medial prefrontal cortex. Other studies suggest that at least five more brain regions probably also contribute to the valuation process: the orbitofrontal cortex, the dorsolateral prefrontal cortex, the amygdala, the insula, and the anterior cingulate cortex.

There are many theories about how the human brain generates and stores utilities, but these theories are far more speculative and in their infancy than everything else I've presented in this tutorial, so I won't discuss them here. Instead, let us conclude with a summary of what neuroscientists know about the human brain's motivational system, and what some of the greatest open questions are.

Summary and Research Directions

Here's what we've learned:

Utilities are real numbers ranging from 0 to 1,000 that take action potentials per second as their natural units. (By 'utility' here I don't mean what's usually meant by the term, I just mean 'utility' for the purpose of predicting choice by measuring the firing rates of certain populations of neurons in the final common path of the choice circuit in the human brain.)
Mean utilities are mean firing rates of specific populations of neurons in the final common path of human choice circuits.
Mean utilities predict choice stochastically, similar to random utility models from economics.
Utilities are encoded cardinally in firing rates relative to neuronal baseline firing rates. (This is opposed to post-Pareto, ordinal notions of utility.)
The choice circuit takes as its input a firing rate that encodes relative (normalized) stochastic expected utility.
As the choice set size grows, so does the error rate.
Final choice is implemented by an argmax function or a reservation price mechanism.

Paul Glimcher lists⁴² the greatest open questions in the field as:

Where is utility stored and how does it get to the choice mechanism?
How does the brain decide when it's time to choose?
What is the neural mechanism that allows us to substitute between two goods at a certain point?
How are probabilistic beliefs represented in the brain?
Utility functions are state-dependent, so how do state and utility function interact?

Later, we'll explore the implications of our findings for metaethics. As of August 2011, if you've read this then you probably know more about how human values actually work than almost every professional metaethicist on Earth. The general lesson here is that you can often out-pace most philosophers simply by reading what today's leading scientists have to say about a given topic instead of reading what philosophers say about it.

Notes

¹ They are: Less Wrong Rationality and Mainstream Philosophy, Philosophy: A Diseased Discipline, On Being Okay with the Truth, The Neuroscience of Pleasure, The Neuroscience of Desire, How You Make Judgments: The Elephant and its Rider, Being Wrong About Your Own Subjective Experience, Intuition and Unconscious Learning, Inferring Our Desires, Wrong About Our Own Desires, Do Humans Want Things?, Not for the Sake of Pleasure Alone, Not for the Sake of Selfishness Alone, Your Evolved Intuitions, When Intuitions Are Useful, Cornell Realism, Railton's Moral Reductionism (Part 1), Railton's Moral Reductionism (Part 2), Jackson's Moral Functionalism, Moral Reductionism and Moore's Open Question Argument, and Are Deontological Moral Judgments Rationalizations?

² Heading Toward: No-Nonsense Metaethics, What is Metaethics?, Conceptual Analysis and Moral Theory, and Pluralistic Moral Reductionism.

³ I tried something similar before, with Cognitive Science in One Lesson.

⁴ Glimcher (2010) offers the best coverage of the topic in a single book. Tobler & Kobayashi (2009) offer the best coverage in a single article.

⁵ The quotes in this section are from Churchland (1981).

⁶ Allen & Ng (2004).

⁷ This perspective goes back at least as far back as Arnauld (1662), who wrote:

To judge what one must do to obtain a good or avoid an evil, it is necessary to consider not only the good and the evil in itself, but also the probability that it happens or does not happen: and to view geometrically the proportion that all these things have together.

⁸ In addition to Caplin & Leahy (2001), see Kreps & Porteus' (1978, 1979) incroporation of the "utility of knowing", Loomes & Sugden's (1982) incorporation of "regret", Gul & Pesendorfer's (2001) incorporation of "the cost of self-control", and Koszegi & Rabin's (2007, 2009) incorporation of the "reference point".

⁹ Friedman (1953).

¹⁰ See a review in Fox & Poldrack (2009).

¹¹ For one difficulty with prospect theory, see Laury & Holt (2008).

¹² Sutton & Barto (2008), p. 3. All quotes from this section are from the early pages of this book.

¹³ From Sutton & Barto (2008).

¹⁴ Much of the rest of this post is basically a summary and paraphrase of Glimcher (2010).

¹⁵ Mirenowicz & Schultz (1994).

¹⁶ Schultz et al. (1997).

¹⁷ Caplin & Dean (2007)

¹⁸ From Glimcher (2010).

¹⁹ Hebb (1949).

²⁰ Malenka & Bear (2004).

²¹ Reynolds & Wickens (2002).

²² Glimcher (2010), p. 341.

²³ Edelman & Keller (1996); Van Gisbergen et al. (1987).

²⁴ Gold and Shadlen (2007); Roitman and Shadlen (2002).

²⁵ Simon (1957).

²⁶ Glimcher (2010), p. 215.

²⁷ McFadden (2000). The behavior of gradually transitioning between two choices is described by Selten (1975).

²⁸ For a probably improved random utility model, see Gul & Pesendorfer (2006).

²⁹ Dean (1983); Werner & Mountcastle (1963).

³⁰ Unless some other feature of the brain turns out to 'smooth out' the stochasticity of neurons involved in valuation and choice-making.

³¹ Glimcher (2010).

³² Heeger (1992, 1993); Carandini & Heeger (1994); Simoncelli & Heeger (1998).

³³ Carandini & Heeger (1994); Britten & Heuer (1999); Zoccolan et al. (2005); Louie & Glimcher (2010).

³⁴ Horowitz & Newsome (2001a, 2001b, 2004).

³⁵ Liu & Wang (2008).

³⁶ But, see Deneve (2009).

³⁷ Glimcher (2010), p. 281.

³⁸ Glimcher (2010), p. 283.

³⁹ This quote and the next quote are from Koszegi & Rabin (2006).

⁴⁰ Glimcher (2010), p. 292.

⁴¹ I won't list them all here. For an overview, see Glimcher (2010), ch. 14.

⁴² Glimcher (2010), ch. 17. I've paraphrased his open questions. I also excluded his 6th question: What Is the Neural Organ for Representing Money?

References

Allais (1953). Le comportement de l'homme rationel devant le risque. Critique des postulates et axiomes de l'ecole americaine. Econometrica, 21: 503-546.

Allen & Ng (2004). Economic behavior. In Spielberger (ed.), Encyclopedia of Applied Psychology, Vol. 1 (pp. 661-666). Academic Press.

Arnauld (1662). Port-Royal Logic.

Basso & Wurtz (1997). Modulation of neuronal activity in superior colliculus by changes in target probability. Journal of Neuroscience, 18: 7519-7534.

Britten & Heuer (1999). Spatial summation in the receptive fields of MT neurons. Journal of Neuroscience, 19: 5074-5084.

Caplin & Dean (2007). Axiomatic neuroeconomics.

Caplin, Dean, Glimcher, & Rutledge (2010). Measuring beliefs and rewards: a neuroeconomic approach. Quarterly Journal of Economics, 125: 3.

Caplin & Leahy (2001). Psychological expected utility theory and anticipatory feelings. Quarterly Journal of Economics, 116: 55-79.

Carandini & Heeger (1994). Summation and devision by neurons in primate visual cortex. Science, 264: 1333-1336.

Churchland (1981). Eliminative materialism and the propositional attitudes. The Journal of Philosophy, 78: 67-90.

Dean (1983). Adaptation-induced alteration of the relation between response amplitude and contrast in cat striate cortical neurons. Vision Research, 23: 249-256.

Deneve (2009). Bayesian decision making in two-alternative forced choices. In Dreher & Tremblay (eds.), Handbook of Reward and Decision Making (pp. 441-458). Academic Press.

Dorris & Glimcher (2004). Activity in posterior parietal cortex is correlated with the subjective desireability of an action. Neuron, 44: 365-378.

Edelman & Keller (1996). Activity of visuomotor burst neurons in the superior colliculus accompanying express saccades. Journal of Neurophysiology, 76: 908-926.

Fox & Poldrack (2009). Prospect theory and the brain. In Glimcher, Camerer, Fehr, & Poldrack (eds.), Neuroeconomics: Decision Making and the Brain (pp. 145-173). Academic Press.

Friedman (1953). Essays in Positive Economics. University of Chicago Press.

Glimcher (2010). Foundations of Neuroeconomic Analysis. Oxford University Press.

Gold and Shadlen (2007). The neural basis of decision making. Annual Review of Neuroscience, 30: 535-574.

Gul & Pesendorfer (2001). Temptation and self-control. Econometrica, 69: 1403-1435.

Gul & Pesendorfer (2006). Random expected utility. Econometrica, 74: 121-146.

Hebb (1949). The organization of behavior. Wiley & Sons.

Heeger (1992). Normalization of cell responses in cat striate cortex. Visual Neuroscience, 9: 181-197.

Heeger (1993). Modeling simple-cell direction selectivity with normalized, half-squared linear operators. Journal of Neurophysiology, 70: 1885-1898.

Horowitz & Newsome (2001a). Target selection for saccadic eye movements: direction selective visual responses in the superior colliculus induced by behavioral training. Journal of Neurophysiology, 86: 2527-2542.

Horowitz & Newsome (2001b). Target selection for saccadic eye movements: prelude activity in the superior colliculus during a direction discrimination task. Journal of Neurophysiology, 86: 2543-2558.

Iyengar & Lepper (2000). When choice is demotivating: Can one desire too much of a good thing? Journal of Personality and Social Psychology, 79: 995-1006.

Jevons (1871). The Theory of Political Economy. Macmillan and Co.

Kahneman & Tversky (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47: 263-291.

Koszegi & Rabin (2006). A model of reference-dependent preferences. Quarterly Journal of Economics, 121: 1133-1165.

Koszegi & Rabin (2007). Reference-dependent risk attitudes. American Economic Review, 97: 1047-1073.

Koszegi & Rabin (2009). Reference-dependent consumption plans. American Economic Review, 99: 909-936.

Kreps & Porteus (1978). Temporal resolution of uncertainty and dynamic choice theory. Econometrica, 46: 185-200.

Kreps & Porteus (1979). Dynamic choice theory and dynamic programming. Econometrica, 47: 91-100.

Laury & Holt (2008). Payoff scale effects and risk preference under real and hypothetical conditions. In Plott & Smith (eds.), Handbook of Experimental Economic Results, Vol. 1 (pp. 1047-1053). Elsevier Press.

Loewenstein (1987). Anticipation and the valuation of delayed consumption. Economic Journal, 97: 666-684.

Liu & Wang (2008). A common cortical circuit mechanism for perceptual categorical discrimination and veridical judgment. PLOS Computational Biology, 4: 1-14.

Loomes & Sugden (1982). Regret theory: An alternative theory of rational choice under uncertainty. Economic Journal, 92: 805-824.

Louie & Glimcher (2010). Separating value from choice: delay discounting activity in the lateral intraparietal area. Journal of Neuroscience, 30: 5498-5507.

Malenka & Bear (2004). LTP and LTD: an embarrassment of riches. Neuron, 44: 5–21.

Mirenowicz & Schultz (1994). Importance of unpredictability for reward responses in primate dopamine neurons. Journal of Neurophysiology, 72: 1024-1027.

Reynolds & Wickens (2002). Dopamine-dependent plasticity of corticostriatal synapses. Neural Networks, 15: 507-521.

Roitman and Shadlen (2002). Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. Nature Neuroscience, 22: 9475-9489.

Scheibehenne, Greifeneder, & Todd (2010). Can there ever be too many options? A meta-analytic review of choice overload. Journal of Consumer Research, 37: 409-425.

Schultz, Dayan, & Montague (1997). A neural substrate of prediction and reward. Science, 275: 1593–1599.

Schwartz & Simoncelli (2001). Natural signal statistics and sensory gain control. Nature Neuroscience, 4: 819-825.

Selten (1975). Reexamination of perfectness concept for equilibrium points in extensive games. International Journal of Game Theory, 4: 25-55.

Simoncelli & Heeger (1998). A model of neuronal responses in visual area MT. Vision Research, 38: 743-761.

Sutton & Barto (2008). Reinforcement Learning: An Introduction. MIT Press.

Tanji & Evarts (1976). Anticipatory activity of motor cortex neurons in relation to direction of an intended movement. Journal of Neurophysiology, 39: 1062-1068.

Tobler & Kobayashi (2009). Electrophysiological correlates of reward processing in dopamine neurons. In Dreher & Tremblay (eds.), Handbook of Reward and Decision Making (pp. 29-50). Academic Press.

Van Gisbergen, Opstal, & Tax (1987). Collicular ensemble coding of saccades based on vector summation. Neuroscience, 21: 651.

Werner & Mountcastle (1963). The variability of central neural activity in a sensory system, and its implications for central reflection of sensory events. Journal of Neurophysiology, 26: 958-977.

Wickens (1993). A Theory of the Striatum. Pergamon Press.

Zoccolan, Cox, & DiCarlo (2005). Multiple object response normalization in monkey inferotemporal cortex. Journal of Neuroscience, 25: 8150-8164.

Many comments seem to imply and provide evidence for this, but I'm going to state it explicitly so it's easier to comment on:

This way of writing long articles seems superior in many ways and you should probably do mostly this instead of the short single-point posts.

More posts like this, please.

Also, reading this kind of research makes the claim "we'll reverse engineer the brain sometime during this century" seem considerably more plausible.

4lukeprog15y

And this is a presentation of the data without all the impressive-looking equations! :)

In the interest of giving you better positive feedback: awesome article! I had previously suspected that you might not have a good justification for your claims and force your conclusions based on weak data. This suspicion is dead for now. Please continue with your long and in-depth posts.

What I particularly liked is that this article appears much less handwavey than usual. Typically, you demonstrate one particular experiment or line of evidence and then just go, "X and Y did dozens of related studies[way][too][many][references]", but I'm lazy and even though I actually download all these papers, it will probably take me weeks or months before I read them all and until then, your posts seem much weaker than they really are. Seeing multiple different approaches at once is much better, especially of the form "basic model" -> "problems with model" -> "proposed alternative" -> "lots of evidence that fits predictions (especially evidence for actual moving parts)".

Also, I like your use of summaries here (and the repetition). One problem of past posts (e.g. 1 2 3) was that your summaries and the actual points made in the article ... (read more)

4lukeprog15y

Thanks for going out of your way to give me such actionable feedback!

Luke, there's a serious and common misconception in your explanation of the independence axiom (serious enough that I don't consider this nitpicking). If you could, please fix it as soon as you can to prevent the spread of this unfortunate misunderstanding. I wrote a post to try and dispell misconceptions such as this one, because utility theory is used in a lot of toy decision theory problems, versions of which might actually be encountered by utility-seeking AIs:

For example, the independence axiom of expected utility theory says that if you prefer one apple to one orange, you must also prefer one apple plus a tiny bit more apple over one orange plus that same tiny bit of apple. If a subject prefers A to B, then the subject can't also prefer B+C to A+C. But Allais (1953) found that subjects do violate this basic assumption under some conditions.

This is not what the independence axiom says. What it says is that, for example, if you prefer an apple over an orange, then you must prefer the gamble [72% chance you get an apple, otherwise you get a cat] over the gamble [72% chance you get an orange, otherwise you get a cat]. The axiom is about mixing probabilistic outcomes, not ... (read more)

5lukeprog15y

Fixed, thanks! I also updated the PDF.

3TobyBartels15y

When I read about the Allais paradox (in lukeprog's post, after he fixed your objection), my first thought was that this violation would occur when the cat was actually something very like an orange, such as a grapefruit. For example, suppose that the cat actually is an orange. So you prefer an apple to an orange, but you prefer an orange to a gamble which is 70% apple and 30% orange. And the neoclassical utility theorist would explain this by saying that you prefer certainty to uncertainty, so adding a term for certainty to the utility function. And then, if the choice is really between 70% apple and 30% grapefruit versus 70% orange and 30% grapefruit, the latter is still more certain than the former (although not completely certain), so might well be preferred. This sounds like I'm trying to come up with a way to save utility theory, but actually that's not how it went. My immediate intuitive reaction to reading lukeprog's paraphrase of your example was ‹I'll bet that this happens when the cat is similar to the orange.›, without any conscious reasoning behind it, and it was only after thinking about this hypothesis that I realised that it suggested a way to save utility theory. So I'm quite curious: Does the Allais paradox appear only when the cat is similar to an orange, or does it also appear when the cat is (as the terms ‘apple’, ‘orange’, and ‘cat’ imply) really quite different?

2jsalvatier15y

Thank you for this. I had seen this several times and didn't understand how they derived this from the independence axiom. I think even Foundations of Neuroecon states the axiom this way.

(Excellent post, Luke. Thank you.)

Others have found Iyengar & Lepper's study hard to replicate:

Benjamin Scheibehenne, a psychologist at the University of Basel, … decided (with Peter Todd and, later, Rainer Greifeneder) to design a range of experiments to figure out when choice demotivates, and when it does not.
But a curious thing happened almost immediately. They began by trying to replicate some classic experiments – such as the jam study, and a similar one with luxury chocolates. They couldn’t find any sign of the “choice is bad” effect. Neither the original Lepper-Iyengar experiments nor the new study appears to be at fault: the results are just different and we don’t know why.
After designing 10 different experiments in which participants were asked to make a choice, and finding very little evidence that variety caused any problems, Scheibehenne and his colleagues tried to assemble all the studies, published and unpublished, of the effect.
The average of all these studies suggests that offering lots of extra choices seems to make no important difference either way. There seem to be circumstances where choice is counterproductive but, despite looking hard for them, we don’t

... (read more)

4lukeprog15y

Great, thanks for pointing me to this! The choice mechanism is definitely not well-understood yet. For example, it's not clear how our brains divide up the choices in the choice set. I've edited my post a bit to reflect this meta-analysis.

I enjoyed this primer; and as I value it I was somewhat dismayed to read some of your conclusions in the "what we've learned" section about neural coding:

Utilities (in humans) are real numbers ranging from 0 to 1,000 that take action potentials per second as their natural units. Utilities are encoded cardinally in firing rates relative to neuronal baseline firing rates. (This is opposed to post-Pareto, ordinal notions of utility.)

I'm curious where you got the highly specific "ranging from 0 to 1,000" bit, but moreover it's misleading to mention rate coding as the be-all end all of neural coding, especially as it is largely out of date. Neural coding has been found to be more complex.

Where the brain needs to encode significant quantitative information quickly, it primarily employs population coding in small local neuron clusters.

With population coding, numbers between 0 to N can be stochastically encoded and sent by a micro-column of N neurons in a short minimal time window of 10ms or less. And in theory population coding can even be far better than that when you consider the considerable connectivity weight variation across neurons in the population. (com... (read more)

0rocketship14y

I don't think "rate coding" can be called "out of date" when we don't even have a ballbark estimate of how computation works in the brain. Research hot topics (namely, temporal coding) are not always indicative of progress. Maybe you don't understand what rate coding is. Population codes are still rates. The fundamental debate is over whether a population response r(t) contains all the relevant information needed to understand computations performed, or whether the statistical characteristics of the spike patterns themselves carry "extra" information. Here, r(t) is obtained by spike filtering, which removes the finer inter-spike-interval information. [...] No one in theoretical neuroscience has ever said anything like this.

0lukeprog15y

I did not "mention rate coding as the be-all end all of neural coding." There is more to neural coding than rate coding, but rate coding is what predicts choice using our best mathematical models of choice in the final common path of the brain's choice circuits.

Upvoted. This format is much more engaging and readable.

0lukeprog15y

As opposed to... trying to claim anything that might be surprising with only a page or two to back it up with? Is that what you mean? I may just not be as good at that as Eliezer is, though I'm trying to learn.

that an animal would prefer inferior fruit it expected to eat over superior fruit it did not expect to eat, is exactly the kind of irrational behavior that we might hope the pressures of evolution would preclude. What observations tell us, however, is that these behaviors do occur.

So that's why people don't feel happy to learn about super effective counterintuitive ways of doing things! Especially when they require an explicit assumption that previously expected-to-work behaviors won't.

I think I'm going to go try and nurture more of a sense that more is possible. Hopefully that should make me warmer to unexpectedly good ways of doing things.

1nazgulnarsil15y

I really like the phrase "More is Possible."

3atucker15y

I think it came to LW through this.

Complete approval. Thank you.

[We can] begin (dimly) to read our own source code.

I guess you're trying to say it's as though we're trying to read in dim light. But if you mean that we're not very bright, I also agree :)

More !

This is thorough and high-level and fascinating and upvoted and saved.

...though probably devoid of any practical use. I was sorta hoping for sections on how this can make you be stupid and how to make that not happen.

Thanks!

There are many lessons to be culled from this material but the article is too long already. :)

Lessons will come in future posts.

0Iabalka15y

Great post. I really liked how you managed even in an intendedly heavy and technical post to mention hot sex. Please keep up in the future posts.

I really enjoyed this article. I took a few sittings to read it, but I liked the continuous format.

Let me just make a general comment on the tone here:

But really, shouldn't it have been obvious all along that humans are irrational? Perhaps it is, to everyone but neoclassical economists and Aristoteleans. (Okay, enough teasing...)

Teasing per se is fine, but this happens to reinforce a popular sentiment which I find misleading. Everyone likes to point out the differences between standard economic assumptions and actual human behavior. Pitted against ot... (read more)

As of August 2011, if you've read this then you probably know more about how human values actually work than almost every professional metaethicist on Earth. The general lesson here is that you can often out-pace most philosophers simply by reading what today's leading scientists have to say about a given topic instead of reading what philosophers say about it.

Do philosophers actually talk about human motivation in the descriptive behavioral summary sense? (Or are you talking about experimental philosophers? But you do mention metaethicists...) In parti... (read more)

5lukeprog15y

I guess if you count experimental philosophers like Joshua Greene & Fiery Cushman as metaethicists, then there are a few metaethicists who know some of this stuff. But I was thinking of more traditional armchair metaethicists. Metaethicists do a wide variety of work. A few have written things worth reading, like Richard Joyce and Stephen Finlay. Metaethicists discuss a broad range of issues, many of them invoking descriptive claims about human motivation. For example, many of them argue (largely via introspection) that motivational internalism is true in humans. But the recent psychology/neuroscience of human motivation makes this position unlikely to be true, as I'll argue in a future post.

0torekp15y

This comment to Jack's recent post discusses motivational internalism and some of that neuroscience.

Thanks Luke. Please keep this up!

Related, on the cutting edge: a study in which subjective likability scores for music did not correlate with sales over the next three years, but activity in the ventral striatum and nucleus accumbens (while listening to the music) did.

For more such goodies, see the abstracts of papers to be presented this month at the second annual interdisciplinary symposium on decision neuroscience.

One other interesting abstract is from an upcoming paper by Joe Kable, 'Heterogeneity in the Neural Substrates of Time Perception and Time Discounting':

Theoretical and empi

... (read more)

5gwern15y

That's actually really useful for me. I've long struggled to come up with, for my esthetics essay, an explanation for why people overwhelmingly choose supposedly inferior cultural goods, as the unpopularity of superior goods suggests that my central thesis is false. Hyperbolic discounting is obvious, of course, but this sort of thing really helps.

I wonder if there is a bit of self-selection going on in the comments here. Considering LW norms I expect very few people if any to leave comments without reading the entire article. Thus naturally the commenter's will like or at least tolerate the new style.

I loved the content and indeed I think its more readable than the shorter articles, but I actually had to eat a snack before mustering the willpower to read this.

3Wilka15y

That seems likely to me. I enjoyed this post a lot, and I've shared with several other people that I think would also like it (and spend the time to read it). But it did take me a while to get to get through, I made coffee at least twice while reading it. I think it was almost 2 hours from opening the article to getting to the end. Not all of that time was spent reading - as well as getting coffee, I paused several times to digest what I'd read so far. However, it was still a lot longer than I'd normally spend on a single post.

Apologies for the pedantry that follows.

Today, we know how Hebb's mechanism works at the molecular level.

This quote gives the impression that there is a unitary learning mechanism at work in the brain called "Hebbian learning," and that how it works is well understood. It is my understanding that this is not accurate.

For example, spike-timing-dependent plasticity is a Hebbian learning rule which has been postulated to underlie at least some forms of long-term potentiation and long-term depression. However, there is ongoing debate as to how ac... (read more)

3lukeprog15y

Agreed. Fixed. Thanks.

0TobyBartels15y

(But this fix introduced a grammatical error; change ‘works’ to ‘work’.)

Awesome post!

One interesting difference between this and the early sequence posts is how much more descriptive this is than say, Mysterious Answers to Mysterious Questions.

Where Eliezer does a great job crystallizing ideas that you feel like you kind of should have known, or rigorously applying one insight from a particular field more generally, you do a lot to explain the background information and lay a foundation for a later understanding.

It took me two readings of MAMQ in order to kind of tease out a bit of why the background is specifically Bayesian, ... (read more)

I find this very useful because I had heard of this material before, but by reading some of it in close proximity like this, it gives me an opportunity to synthesize new ideas I didn't get from reading it as separately.

I'm definitely going to upvote and save this for rereading and reference later as well.

Utilities (in humans) are real numbers ranging from 0 to 1,000 that take action potentials per second as their natural units.

I downvoted the post based off this quote. To the extent that utilities are referring to the kind of thing that we describe with that term on lesswrong utilities are not anywhere near as simple as that. In fact things that we ascribe negative utility to can be significantly positive on that kind of encoding. Nevermind that the sience is out of date.

6lukeprog15y

Goodness, no, I don't mean utility here in the usual LW sense. I've added a new clarifying sentence to item #1 here. As for your last sentence, see my reply to jacob_cannell.

3wedrifid15y

Thankyou, I'm far more comfortable now!

I like this format. The most fascinating part to me, and I guess I was somewhat aware of this before, is the concept that you can use dopamine based reinforcement to train yourself to enjoy certain behaviors by linking them to behaviors you already find rewarding.

4Pfft15y

I think the inverse is even more striking: the Sinclair Method for treating alcoholism consists of the alcoholic just continuing to drink as before, but taking an opiod antagonist one hour before drinking. Before reading this article that just seemed like black magic to me, but it makes perfect sense once you know that neurons will become less connected if they fire together in the absence of dopamine.

4gwern15y

There are even cooler ways. I like the idea of using nicotine.

Wow, Luke, this was simply amazing. Thank you very much! I actually have to say I enjoyed this longer post a lot more than the previous shorter ones you made. This post painted a more thorough and complete picture. It presented multiple details and showed how they connected, as opposed to showing one detail and saying its connection will be obvious in the future.

5lukeprog15y

Yes, I can understand the trouble with shorter posts when I'm trying to make detailed arguments. I try to give a little bit of the convergent evidence and then upload PDFs of the rest of the evidence and point to them, but of course almost nobody is going to read those PDFs. Taking the space to explain a greater proportion of convergent evidence behind my positions is of course more persuasive and helpful to people, but it takes more time and hasn't been the usual format of Less Wrong, which has traditionally heavily favored short articles with few or no citations of evidence. Since every single comment in response to this article so far has been positive, which I don't think has ever happened before with this many comments, I update in favor of doing this kind of thing again.

0Alexei15y

Oh, yeah, I can see how this would put a lot more burden on you, rather than us. Sorry and thanks for being awesome!

Reading this in 2015 now, have we made any significant advances in the list of greatest open questions? Are there any research papers someone could point me to?

Like everybody else, I think that this post is awesome.

One thing that bugs me. You write:

dopamine neuron firing rates were a compressive function of reward magnitude

But the link there is to (the English Wikipedia's discussion of) compression functions in cryptography, which seem irrelevant. In particular, both dopamine rate and reward magnitude are ordered, and it seems that the function from the latter to the former should respect this order to be of any use; while cryptographic compression functions should thoroughly entangle the order (to the exte... (read more)

4lukeprog15y

The footnote in that sentence leads you to this short paper linked from my bibliography, which reads: [...] I'm sorry I don't have more time to go into this, hopefully the paper link helps.

Thanks, I don't know why I didn't follow the footnote.

But if I had, I would have added to my comment that the cited paper confirms my expectations. The function that they describe (as in your quoted paragraph) does preserve ordering, and seems to have nothing to do with the compressive functions described at Wikipedia. (The paper also doesn't use that term; the case-independent string ‘compr’ doesn't appear in it at all.)

But actually, the point of the article seems to be that the function from reward magnitude to dopamine rate varies with time, being renormalised from time to time to be most sensitive (literally, having highest derivative) at the most likely inputs, which I did not get from your post at all. But if I were an editor, wanting your post to best reflect the article without getting any longer, I'd suggest just changing ‘compressive function’ to ‘variable function’ and removing the irrelevant link.

Not that any of this should detract from your otherwise excellent and everywhere interest post!

Gack! I'm just completely wrong about this one. Thanks for reading so closely and correcting my mistake!

2TobyBartels15y

You're welcome then!

2lessdazed15y

Upvoted.

2lessdazed15y

Upvoted for reading closely and then reading a paper in the footnotes.

Friggin amazing! More like this! This comes at an extremely opportune time for me, as Andrew McKnight (a fellow Seattle LWer) and I are doing a cognitive science study group and were just starting to read Foundations of Neuroeconomics. This will provide a valuable guide to our study. We will try to keep track of our study to advise others. I hope that one day this kind of material is provided in a standard course that people can take.

Comments:

The neuron firing bar graph thingies are a bit tricky to interpret at first. Perhaps you want to use mean firing

... (read more)

4lukeprog15y

Oops, it's fairer to say the LIP is simply part of the final common path for the generation of voluntary eye movements. I'll just say that. Thanks for catching this.

1Jonathan_Graehl15y

I dislike having things that are already completely finished meted out to me one per day. If I like it, maybe I want to read it all now. I'm usually disappointed with at least one of the 7 parts of someone's one-per day post - Hey! You didn't get to that new stuff I wanted! I know, I have problems. On the other hand, if you (the hypothetical tease) expect to revise the later material based on comments from previous days, it may be worth annoying me to do so.

The presumed domain of FP used to be much larger than it is now. In primitive cultures, the behavior of most of the elements of nature were understood in intentional terms. The wind could know anger, the moon jealousy, the river generosity… These were not metaphors… the animistic approach to nature has dominated our history, and it is only in the last two or three thousand years that we have restricted FP’s literal application to the domain of the higher animals.
[Even still,] the FP of the Greeks is essentially the FP we uses today… This is a very long pe

... (read more)

Two 'books' published since I wrote this article:

Sharot & Dolan, Neuroscience of Preference and Choice
Hytonen, Context Effects in Valuation, Judgment, and Choice: A Neuroscientific Approach

FYI, Bug report: The push-pull experiment is illustrated by a diagram of the future discounting experiment.

EDIT: It is fixed now.

This was a bit too long for me, although including that table of contents helped a lot and I wish more long posts did that. (It's not quite a high level deductive argument, where each section is arguing or explaining a particular axiom or deductive step, but it's helpful anyway.)

Thanks for this Luke. Personally I like the shorter single-point posts in general, but this level of depth/scope is refreshing (and awesome).

Louie & Glimcher (2010)

A link to the paper: https://www.jneurosci.org/content/30/16/5498.short
The temporal discount factor, d, which they find is hyperbolic, i.e., of the form d = 1/(1 + k T), where k is some constant and T is the time to reward.

A sequence of Tic-Tac-Toe moves might look like this:¹³

Should there be an image here?

Luke, the link to Churchland (1981) is outdated. It says "object not found." Let me know if you need a PDF copy, either for personal use or to upload to CSA.

5gwern14y

We should all be familiar with the Internet Archive here: http://web.archive.org/web/20050403025924/http://philosophy.wisc.edu/Shapiro/Phil554/PAPERS/Churchland.pdf

1[anonymous]14y

I wasn't aware that service existed. Bookmarked and upvoted. Thank you kindly for the helpful information.

I am interested in understanding this material more in depth and using this post as a guide. Would your advice be to basically read this post but follow all the footnotes/links and read those sources as well? Is there anything else you would suggest?

3lukeprog15y

Well, you're currently reading through Foundations, right? The most interesting parts of the above are explained better (in much more detail) in Foundations than I had the space to do here. So if you actually read Foundations - not a trivial investment! - you'll understand all the most important things I'm hinting toward in this post, and in much more detail.

3lessdazed15y

I think this generally applies, though it might apply less or more in this case dependent on the material in question, which I have not studied. Other minds are a sanity check. Consider writing for others after or during your learning.

0jsalvatier15y

Yes, that's a good point. I have such a person already who is fortunately a psychology grad student.

Lukeprog quoting Tobler:

...consider a subject trained to choose between objects A and B, where A is $1,000,000 worth of goods and B is $100,000 worth of goods... A system that represented only the relative expected subjective value of A and B would represent SV(A) > SV(B). Next, consider training the same subject to choose between C and D, where C is $1,000 worth of goods and D is $100 worth of goods. Such a system would represent SV(C) > SV(D). What happens when we ask a chooser to select between B and C? For a chooser who represents only relative

... (read more)

2lukeprog15y

Refuting a strawman? Whom is this passage (by Glimcher, not Tobler) supposed to be refuting? I'm not sure anyone would disagree with this. 'Representing only relative subjective value' means that the system would only encode subjective value (utility), not objective value. Representation in this case would occur via neuron firing rates.

1Jonathan_Graehl15y

Strawman position. They made up the position then refuted it. The possibility being argued against is, as I understand it: We remember the value of everything we've decided between as follows: every time we compare it in a choice against alternatives, we remember how strongly it beat the alternatives. And that's all we remember. When we next evaluate that thing, if we have a memory of it, that's all we use - that coded value of how much we (dis)preferred it against some alternatives (which we've forgotten). This is very bad. It's not even complete. Where does the first preference come from? Why even discuss such a thing? To fix the above, we can say that the preference is a weighted combination of choice-amnesiac-desirability (as if we have no memories of ever having (dis)preferred the thing to other things in the past), and all past choices pro/con that thing (without reference to the competitors' values). This is now well defined, and perhaps worth ruling out by experiment.

0torekp15y

I too am having trouble understanding what the "relative subjective value" hypothesis is supposed to be.

I enjoyed this, thank you. (It sat in a tab for a day waiting for me to get around to reading it, however.)

I don't quite follow where "or that an animal would prefer inferior fruit it expected to eat over superior fruit it did not expect to eat" follows from the reference dependence discussion, however - can you elaborate on that?

Some of the issues and empirical results are painlessly expressed in this 11 minute video from RSA animate: Dan Pink on drive.

4pjeby15y

Interesting. The studies described in the first 5 minutes look like an empirical validation of my long-professed notion that "what pushes you forward, holds you back". I'm rather curious, because it's implied these effects occur even when the thing pushing people forward is something quite positive, as opposed to a fear of something negative (e.g., losing the possible reward). But I don't imagine that the studies included any sort of control for how participants represented the situation to themselves from a prospect-theory point of view. The video is only the tip of the iceberg of the science in lukeprog's post though; yesterday I wrote a 3400 word missive for the Mindhackers' Guild exploring a narrow subset of the practical applications of just a couple of bits of the neural math described here. It very nicely ties together a whole bunch of stuff that was basically personal and/or Guild folklore, and offers us some new research directions for applied motivation. For example: prospect theory appears to imply that if you want to reinforce a new positive behavior, you need to not expect yourself to perform the behavior. Otherwise, performing the behavior will not result in positive reinforcement (after all, it's just what you expected!), and failing to perform the behavior will be punishing, due to the perception of loss. So, you end up literally training yourself (in a behaviorist sort of way) to extinction, just by the way you frame your thinking about your goal. (I've already been teaching a similar idea for years, but I missed the connection to reinforcement and the application to encouraging new behaviors. So I'm testing that application on myself right now.)

0[anonymous]15y

How would this work, assuming I don't have a hidden benefactor to surprise me? Do you just pick a simple first step and tell yourself "aw, I'm never going to succeed, but I'm gonna try anyway" until it sticks? I don't see how I can simultaneously want to build a habit of doing X, not expect to do X, but still actually do X, and not just once, but regularly. Isn't this explicit doublethink? I mean, either I believe my realistic estimate (but then I expect it) or I screw up my ability to model my own behavior (e.g. by having bad calibration or introspection). Also, wouldn't being overly deterministic about an unwanted habit help extinguish it? If I predict I will be wasting time on reddit in 10 minutes, 20 minutes, ... and so on, all day, then no matter what happens, I win. That doesn't seem right.

4pjeby15y

This is a language problem: I'm using "expect" in the prospect theory sense here, not the probabilistic one. It's about emotional investment in the outcome, not anticipating probability of occurrence. You could say that it's a distinction between "should" expectations and "is" expectations. Prospect theory -- or at the very least this application of it -- is about "should" expectations, as it's the basis for establishing a decision frame , which includes a notion of investment/cost as well as expected utility. The hack I'm experimenting with is setting a perceptual frame where not doing the desired action is perceived as zero loss, and doing the action is perceived as a cheap gain. (In contrast to having an expectation that the default should be that I do the action, in which case doing it is perceive as zero-gain, and failing to do it is a loss.) I don't have any long-term experience with the reinforcement aspect yet, but my early results so far (1 behavior, 3 instances in two days) is that the framing is fun. It feels like "You mean I get points just for doing that little thing? Cool!" (The trickiest part was that I had to first mind-hack away the mental blocks that made it seem low-status to me to think this way.)

The wrong picture appears after

The third cue was the ‘go’ signal: if the monkey made the previously indicated movement, it was rewarded.

This is what they saw:

The same picture appears again in its correct place, later.

0lukeprog15y

Darn it, copy/paste! Okay it's fixed now, thanks!

If this were multiple posts, I could upvote it more than once. ;)

0Manfred15y

The problem of splitting posts correctly might make an interesting post...

Confusing dash-spacing:

in nondurable consumption-where there is no object with which the person can be endowed-a status-quo-based theory cannot capture the role of reference dependence at all

fixed:

in nondurable consumption -- where there is no object with which the person can be endowed -- a status-quo-based theory cannot capture the role of reference dependence at all

1lukeprog15y

Fixed, thanks!

Excellent article! I really enjoy this style of writing, where the information is laid out in a story-like format.

Excellent article! I really like this style.

I have one question about a passage earlier on, which points out a flaw in FP:

Finally, consider the problem of habit. I sit at my computer and want to type my name, 'Luke.' However, I hve just used a special program to switch the function of the keys labeled L and P so that they will input the other character instead (so that I can play a prank on my friend, who will be using my computer shortly). I believe that typing the key labeled L will input P instead, but nevertheless when I type my name my fingers fall

... (read more)

Just a little error I saw in the Neoclassical Economics section:

If you are risk averse you might choose the blue box because it has higher expected subjective value even though it has higher expected objective value.

Should be "even though it has lower expected objective value." Also, I've really enjoyed the post so far.

1lukeprog15y

Fixed, thanks!

So, your earlier posts were basically lemmas? Certainly I appreciate all the reference notes.

I love these kinds of articles!

But i really cannot wait till you post about how we can use these insights to actually change our minds and behavior!

Many comments seem to imply and provide evidence for this, but I'm going to state it explicitly so it's easier to comment on:

This way of writing long articles seems superior in many ways and you should probably do mostly this instead of the short single-point posts.

210

A Crash Course in the Neuroscience of Human Motivation

210

Preface

Contents:

Folk Psychology

Neoclassical Economics

Behaviorism and Reinforcement Learning

Reinforcement Learning and Decision Theory

The Turn to the Brain

Hebbian Learning

Expected Utility in Neurons

Real-Time Expected Utility Updates

Argmax and Reservation Price

Random Utility

Discounting

Relative and Absolute Utility

Normalization

Are Actions Choices?

The Primate Choice Mechanism: A Brief Review

Marginal Utility and Reference Dependence

Valuation in the Brain

Notes

References

210

210