Many comments seem to imply and provide evidence for this, but I'm going to state it explicitly so it's easier to comment on:
This way of writing long articles seems superior in many ways and you should probably do mostly this instead of the short single-point posts.
More posts like this, please.
Also, reading this kind of research makes the claim "we'll reverse engineer the brain sometime during this century" seem considerably more plausible.
In the interest of giving you better positive feedback: awesome article! I had previously suspected that you might not have a good justification for your claims and force your conclusions based on weak data. This suspicion is dead for now. Please continue with your long and in-depth posts.
What I particularly liked is that this article appears much less handwavey than usual. Typically, you demonstrate one particular experiment or line of evidence and then just go, "X and Y did dozens of related studies[way][too][many][references]", but I'm lazy and even though I actually download all these papers, it will probably take me weeks or months before I read them all and until then, your posts seem much weaker than they really are. Seeing multiple different approaches at once is much better, especially of the form "basic model" -> "problems with model" -> "proposed alternative" -> "lots of evidence that fits predictions (especially evidence for actual moving parts)".
Also, I like your use of summaries here (and the repetition). One problem of past posts (e.g. 1 2 3) was that your summaries and the actual points made in the article ...
Luke, there's a serious and common misconception in your explanation of the independence axiom (serious enough that I don't consider this nitpicking). If you could, please fix it as soon as you can to prevent the spread of this unfortunate misunderstanding. I wrote a post to try and dispell misconceptions such as this one, because utility theory is used in a lot of toy decision theory problems, versions of which might actually be encountered by utility-seeking AIs:
For example, the independence axiom of expected utility theory says that if you prefer one apple to one orange, you must also prefer one apple plus a tiny bit more apple over one orange plus that same tiny bit of apple. If a subject prefers A to B, then the subject can't also prefer B+C to A+C. But Allais (1953) found that subjects do violate this basic assumption under some conditions.
This is not what the independence axiom says. What it says is that, for example, if you prefer an apple over an orange, then you must prefer the gamble [72% chance you get an apple, otherwise you get a cat] over the gamble [72% chance you get an orange, otherwise you get a cat]. The axiom is about mixing probabilistic outcomes, not ...
(Excellent post, Luke. Thank you.)
Others have found Iyengar & Lepper's study hard to replicate:
...Benjamin Scheibehenne, a psychologist at the University of Basel, … decided (with Peter Todd and, later, Rainer Greifeneder) to design a range of experiments to figure out when choice demotivates, and when it does not.
But a curious thing happened almost immediately. They began by trying to replicate some classic experiments – such as the jam study, and a similar one with luxury chocolates. They couldn’t find any sign of the “choice is bad” effect. Neither the original Lepper-Iyengar experiments nor the new study appears to be at fault: the results are just different and we don’t know why.
After designing 10 different experiments in which participants were asked to make a choice, and finding very little evidence that variety caused any problems, Scheibehenne and his colleagues tried to assemble all the studies, published and unpublished, of the effect.
The average of all these studies suggests that offering lots of extra choices seems to make no important difference either way. There seem to be circumstances where choice is counterproductive but, despite looking hard for them, we don’t
I enjoyed this primer; and as I value it I was somewhat dismayed to read some of your conclusions in the "what we've learned" section about neural coding:
Utilities (in humans) are real numbers ranging from 0 to 1,000 that take action potentials per second as their natural units. Utilities are encoded cardinally in firing rates relative to neuronal baseline firing rates. (This is opposed to post-Pareto, ordinal notions of utility.)
I'm curious where you got the highly specific "ranging from 0 to 1,000" bit, but moreover it's misleading to mention rate coding as the be-all end all of neural coding, especially as it is largely out of date. Neural coding has been found to be more complex.
Where the brain needs to encode significant quantitative information quickly, it primarily employs population coding in small local neuron clusters.
With population coding, numbers between 0 to N can be stochastically encoded and sent by a micro-column of N neurons in a short minimal time window of 10ms or less. And in theory population coding can even be far better than that when you consider the considerable connectivity weight variation across neurons in the population. (com...
that an animal would prefer inferior fruit it expected to eat over superior fruit it did not expect to eat, is exactly the kind of irrational behavior that we might hope the pressures of evolution would preclude. What observations tell us, however, is that these behaviors do occur.
So that's why people don't feel happy to learn about super effective counterintuitive ways of doing things! Especially when they require an explicit assumption that previously expected-to-work behaviors won't.
I think I'm going to go try and nurture more of a sense that more is possible. Hopefully that should make me warmer to unexpectedly good ways of doing things.
Complete approval. Thank you.
[We can] begin (dimly) to read our own source code.
I guess you're trying to say it's as though we're trying to read in dim light. But if you mean that we're not very bright, I also agree :)
More !
This is thorough and high-level and fascinating and upvoted and saved.
...though probably devoid of any practical use. I was sorta hoping for sections on how this can make you be stupid and how to make that not happen.
Thanks!
There are many lessons to be culled from this material but the article is too long already. :)
Lessons will come in future posts.
I really enjoyed this article. I took a few sittings to read it, but I liked the continuous format.
Let me just make a general comment on the tone here:
But really, shouldn't it have been obvious all along that humans are irrational? Perhaps it is, to everyone but neoclassical economists and Aristoteleans. (Okay, enough teasing...)
Teasing per se is fine, but this happens to reinforce a popular sentiment which I find misleading. Everyone likes to point out the differences between standard economic assumptions and actual human behavior. Pitted against ot...
As of August 2011, if you've read this then you probably know more about how human values actually work than almost every professional metaethicist on Earth. The general lesson here is that you can often out-pace most philosophers simply by reading what today's leading scientists have to say about a given topic instead of reading what philosophers say about it.
Do philosophers actually talk about human motivation in the descriptive behavioral summary sense? (Or are you talking about experimental philosophers? But you do mention metaethicists...) In parti...
Related, on the cutting edge: a study in which subjective likability scores for music did not correlate with sales over the next three years, but activity in the ventral striatum and nucleus accumbens (while listening to the music) did.
For more such goodies, see the abstracts of papers to be presented this month at the second annual interdisciplinary symposium on decision neuroscience.
One other interesting abstract is from an upcoming paper by Joe Kable, 'Heterogeneity in the Neural Substrates of Time Perception and Time Discounting':
...Theoretical and empi
I wonder if there is a bit of self-selection going on in the comments here. Considering LW norms I expect very few people if any to leave comments without reading the entire article. Thus naturally the commenter's will like or at least tolerate the new style.
I loved the content and indeed I think its more readable than the shorter articles, but I actually had to eat a snack before mustering the willpower to read this.
Apologies for the pedantry that follows.
Today, we know how Hebb's mechanism works at the molecular level.
This quote gives the impression that there is a unitary learning mechanism at work in the brain called "Hebbian learning," and that how it works is well understood. It is my understanding that this is not accurate.
For example, spike-timing-dependent plasticity is a Hebbian learning rule which has been postulated to underlie at least some forms of long-term potentiation and long-term depression. However, there is ongoing debate as to how ac...
Awesome post!
One interesting difference between this and the early sequence posts is how much more descriptive this is than say, Mysterious Answers to Mysterious Questions.
Where Eliezer does a great job crystallizing ideas that you feel like you kind of should have known, or rigorously applying one insight from a particular field more generally, you do a lot to explain the background information and lay a foundation for a later understanding.
It took me two readings of MAMQ in order to kind of tease out a bit of why the background is specifically Bayesian, ...
I find this very useful because I had heard of this material before, but by reading some of it in close proximity like this, it gives me an opportunity to synthesize new ideas I didn't get from reading it as separately.
I'm definitely going to upvote and save this for rereading and reference later as well.
Utilities (in humans) are real numbers ranging from 0 to 1,000 that take action potentials per second as their natural units.
I downvoted the post based off this quote. To the extent that utilities are referring to the kind of thing that we describe with that term on lesswrong utilities are not anywhere near as simple as that. In fact things that we ascribe negative utility to can be significantly positive on that kind of encoding. Nevermind that the sience is out of date.
I like this format. The most fascinating part to me, and I guess I was somewhat aware of this before, is the concept that you can use dopamine based reinforcement to train yourself to enjoy certain behaviors by linking them to behaviors you already find rewarding.
Wow, Luke, this was simply amazing. Thank you very much! I actually have to say I enjoyed this longer post a lot more than the previous shorter ones you made. This post painted a more thorough and complete picture. It presented multiple details and showed how they connected, as opposed to showing one detail and saying its connection will be obvious in the future.
Reading this in 2015 now, have we made any significant advances in the list of greatest open questions? Are there any research papers someone could point me to?
Like everybody else, I think that this post is awesome.
One thing that bugs me. You write:
dopamine neuron firing rates were a compressive function of reward magnitude
But the link there is to (the English Wikipedia's discussion of) compression functions in cryptography, which seem irrelevant. In particular, both dopamine rate and reward magnitude are ordered, and it seems that the function from the latter to the former should respect this order to be of any use; while cryptographic compression functions should thoroughly entangle the order (to the exte...
Thanks, I don't know why I didn't follow the footnote.
But if I had, I would have added to my comment that the cited paper confirms my expectations. The function that they describe (as in your quoted paragraph) does preserve ordering, and seems to have nothing to do with the compressive functions described at Wikipedia. (The paper also doesn't use that term; the case-independent string ‘compr’ doesn't appear in it at all.)
But actually, the point of the article seems to be that the function from reward magnitude to dopamine rate varies with time, being renormalised from time to time to be most sensitive (literally, having highest derivative) at the most likely inputs, which I did not get from your post at all. But if I were an editor, wanting your post to best reflect the article without getting any longer, I'd suggest just changing ‘compressive function’ to ‘variable function’ and removing the irrelevant link.
Not that any of this should detract from your otherwise excellent and everywhere interest post!
Gack! I'm just completely wrong about this one. Thanks for reading so closely and correcting my mistake!
Friggin amazing! More like this! This comes at an extremely opportune time for me, as Andrew McKnight (a fellow Seattle LWer) and I are doing a cognitive science study group and were just starting to read Foundations of Neuroeconomics. This will provide a valuable guide to our study. We will try to keep track of our study to advise others. I hope that one day this kind of material is provided in a standard course that people can take.
Comments:
...The presumed domain of FP used to be much larger than it is now. In primitive cultures, the behavior of most of the elements of nature were understood in intentional terms. The wind could know anger, the moon jealousy, the river generosity… These were not metaphors… the animistic approach to nature has dominated our history, and it is only in the last two or three thousand years that we have restricted FP’s literal application to the domain of the higher animals.
[Even still,] the FP of the Greeks is essentially the FP we uses today… This is a very long pe
Two 'books' published since I wrote this article:
FYI, Bug report: The push-pull experiment is illustrated by a diagram of the future discounting experiment.
EDIT: It is fixed now.
This was a bit too long for me, although including that table of contents helped a lot and I wish more long posts did that. (It's not quite a high level deductive argument, where each section is arguing or explaining a particular axiom or deductive step, but it's helpful anyway.)
Thanks for this Luke. Personally I like the shorter single-point posts in general, but this level of depth/scope is refreshing (and awesome).
Louie & Glimcher (2010)
A link to the paper: https://www.jneurosci.org/content/30/16/5498.short
The temporal discount factor, d, which they find is hyperbolic, i.e., of the form d = 1/(1 + k T), where k is some constant and T is the time to reward.
Luke, the link to Churchland (1981) is outdated. It says "object not found." Let me know if you need a PDF copy, either for personal use or to upload to CSA.
I am interested in understanding this material more in depth and using this post as a guide. Would your advice be to basically read this post but follow all the footnotes/links and read those sources as well? Is there anything else you would suggest?
Lukeprog quoting Tobler:
......consider a subject trained to choose between objects A and B, where A is $1,000,000 worth of goods and B is $100,000 worth of goods... A system that represented only the relative expected subjective value of A and B would represent SV(A) > SV(B). Next, consider training the same subject to choose between C and D, where C is $1,000 worth of goods and D is $100 worth of goods. Such a system would represent SV(C) > SV(D). What happens when we ask a chooser to select between B and C? For a chooser who represents only relative
I enjoyed this, thank you. (It sat in a tab for a day waiting for me to get around to reading it, however.)
I don't quite follow where "or that an animal would prefer inferior fruit it expected to eat over superior fruit it did not expect to eat" follows from the reference dependence discussion, however - can you elaborate on that?
Some of the issues and empirical results are painlessly expressed in this 11 minute video from RSA animate: Dan Pink on drive.
The wrong picture appears after
The third cue was the ‘go’ signal: if the monkey made the previously indicated movement, it was rewarded.
This is what they saw:
The same picture appears again in its correct place, later.
Confusing dash-spacing:
in nondurable consumption-where there is no object with which the person can be endowed-a status-quo-based theory cannot capture the role of reference dependence at all
fixed:
in nondurable consumption -- where there is no object with which the person can be endowed -- a status-quo-based theory cannot capture the role of reference dependence at all
Excellent article! I really enjoy this style of writing, where the information is laid out in a story-like format.
Excellent article! I really like this style.
I have one question about a passage earlier on, which points out a flaw in FP:
...Finally, consider the problem of habit. I sit at my computer and want to type my name, 'Luke.' However, I hve just used a special program to switch the function of the keys labeled L and P so that they will input the other character instead (so that I can play a prank on my friend, who will be using my computer shortly). I believe that typing the key labeled L will input P instead, but nevertheless when I type my name my fingers fall
Just a little error I saw in the Neoclassical Economics section:
If you are risk averse you might choose the blue box because it has higher expected subjective value even though it has higher expected objective value.
Should be "even though it has lower expected objective value." Also, I've really enjoyed the post so far.
I love these kinds of articles!
But i really cannot wait till you post about how we can use these insights to actually change our minds and behavior!
[PDF of this article updated Aug. 23, 2011]
[skip to preface]
Whenever I write a new article for Less Wrong, I'm pulled in two opposite directions.
One force pulls me toward writing short, exciting posts with lots of brain candy and just one main point. Eliezer has done that kind of thing very well many times: see Making Beliefs Pay Rent, Hindsight Devalues Science, Probability is in the Mind, Taboo Your Words, Mind Projection Fallacy, Guessing the Teacher's Password, Hold Off on Proposing Solutions, Applause Lights, Dissolving the Question, and many more.
Another force pulls me toward writing long, factually dense posts that fill in as many of the pieces of a particular argument in one fell swoop as possible. This is largely because I want to write about the cutting edge of human knowledge but I keep realizing that the inferential gap is larger than I had anticipated, and I want to fill in that inferential gap quickly so I can get to the cutting edge.
For example, I had to draw on dozens of Eliezer's posts just to say I was heading toward my metaethics sequence. I've also published 21 new posts (many of them quite long and heavily researched) written specifically because I need to refer to them in my metaethics sequence.1 I tried to make these posts interesting and useful on their own, but my primary motivation for writing them was that I need them for my metaethics sequence.
And now I've written only four posts2 in my metaethics sequence and already the inferential gap to my next post in that sequence is huge again. :(
So I'd like to try an experiment. I won't do it often, but I want to try it at least once. Instead of writing 20 more short posts between now and the next post in my metaethics sequence, I'll attempt to fill in a big chunk of the inferential gap to my next metaethics post in one fell swoop by writing a long tutorial post (a la Eliezer's tutorials on Bayes' Theorem and technical explanation).3
So if you're not up for a 20-page tutorial on human motivation, this post isn't for you, but I hope you're glad I bothered to write it for the sake of others. If you are in the mood for a 20-page tutorial on human motivation, please proceed.
- Don DeLillo, White Noise
Preface
How do we value things, and choose between options? Philosophers, economists, and psychologists have long tried to answer these questions. But human behavior continues to defy our most subtle models of it, and the algorithms producing our behavior remained hidden in a black box.
But now, neuroscientists are directly measuring the neurons whose firing rates encode value and produce our choices. We know a lot more about the neuroscience of human motivation than you might think. Now we can peer directly into the black box of human motivation, and begin (dimly) to read our own source code.
The neuroscience of human motivation has implications for philosophy of mind and action, for scientific self-help, and for metaethics and Friendly AI. (We don't really know what we want, and looking directly at the algorithms that produce human wanting might help in solving this mystery.)
So, I wrote a crash course in the neuroscience of human motivation.
The purpose of this document is not to argue for any of the conclusions presented within it. That would require not a long blog post but instead a couple 500-page books — say, Foundations of Neuroeconomic Analysis and Handbook of Reward and Decision Making (my two greatest sources for this post).4
Instead, I merely want to summarize the current mainstream scientific picture on the neuroscience of human motivation, explain some of the concepts it uses, and tell a few stories about how our current picture of human motivation developed.
As you read this, I hope that many questions and objections will come to mind, because it's not the full story. That's why I went to the trouble of linking to PDFs of almost all my sources (see References): so you can check the full data and the full arguments yourself if you like.
This document is long. You may prefer to read it in sections.
Contents:
Folk Psychology
There are these things called 'humans' on planet Earth. They undergo metabolism and cell growth. They produce waste. They maintain homeostasis. They reproduce. They move. They communicate. Sometimes they have pillow fights.
Some of these human processes are 'automatic', like cell growth and breathing. Other processes are 'intentional' or 'willed', like moving and communicating and having pillow fights. We call these latter processes intentional actions, or simply actions. Sometimes we're not sure where to draw the line between automatic processes and actions, but this should become clearer as we learn more. In the meantime, we ask...
How can we explain human actions?
One popular explanation is 'folk psychology.' Folk psychology posits that we humans have beliefs and desires, and that we are motivated to do what we believe will fulfill our desires.
I desire to eat a cookie. I believe I can fulfill that desire if I walk to the kitchen and put one of the cookies there into my mouth. So I am motivated to walk to the kitchen and put a cookie in my mouth.
Of course there are complications. For example I have multiple desires. Suppose I desire to eat a cookie and believe there are cookies in the kitchen. But I also desire to remain sitting comfortably in the living room. Can I satisfy both desires? I also believe that if I nicely ask my friend in the kitchen to bring me a cookie, she will. So I ask her to bring me a cookie and I begin to eat it, without having to leave the comfy living room sofa. We still explain my behavior with constructs like 'beliefs' and 'desires', but we consider more than one of each to do so.
Most of us use folk psychology every day to successfully predict human behavior. I believe that my friend desires to do nice things for me on occasion if they're not too much trouble, and I believe that my friend, once I tell her I want a cookie, will believe she can be nice to me without much trouble if she brings me a cookie from the kitchen. So, I predict that my friend will bring me a cookie when I ask her. So I ask her, and behold! My prediction was correct. I am happily eating a cookie on the sofa.
But folk psychology (FP) faces some problems.5 Consider its context in history:
Consider also its prospects for inter-theoretic reduction:
Finally, consider the problem of habit. I sit at my computer and want to type my name, 'Luke.' However, I have just used a special program to switch the function of the keys labeled L and P so that they will input the other character instead (so that I can play a prank on my friend, who will be using my computer shortly). I believe that typing the key labeled L will input P instead, but nevertheless when I type my name my fingers fall into their familiar habit and I end up typing my name as 'Puke.' My act of typing was intentional, and yet I didn't do what I believed would fulfill my desire to type my name.
Folk psychology faces both successes and failures in explaining human action. Hopefully we can do better.
Neoclassical Economics
Folk psychology was updated and quantified by neoclassical economics. To summarize:
Let's review this notion of maximizing expected utility. Suppose I can choose one of two boxes sitting before me, red and blue. There is a 10% chance the red box contains a million dollars, and a 90% chance it contains nothing. As for the blue box, I am certain it contains $10,000. The 'expected value' of choosing the red box is (0.1 × $1,000,000) + (0.9 × $0), which is equal to $100,000. The expected value of choosing the blue box is !1 × $10,000), or $10,000. An agent that chose whatever had the highest expected value would choose the red box, which has 10 times the expected value of the blue box ($100,000 vs. $10,000).
But humans don't value things only according to their dollar value. A million dollars might have 10 times the objective value of $100,000, but it might have less than 10 times the subjective value of $100,000 because after $100,000 you only care a little how much more wealthy you are.
Or, you might be risk averse. You might prefer a sure thing to something that is uncertain. So a 10% chance of a million dollars might be worth less — in subjective value — than a 100% chance of $10,000. If you are risk averse you might choose the blue box because it has higher expected subjective value even though it has lower expected objective value.
We call objective value simply 'value'. We call subjective value 'utility.'
Neoclassical economics quantifies folk psychology by measuring the strength of belief with probability and by measuring the strength of desire with utility. It then says that humans act so as to maximize expected utility, a measure that combines the utility of particular thing with your subjective probability of getting it.7
This neoclassical model of human behavior has faced many challenges, and is regularly revised in the face of new evidence.8 For example, Loewenstein (1987) found that if students were asked to place a value on the opportunity to kiss a celebrity of their choice 1-5 days in the future, they placed the highest value on a kiss in 3 days. This didn't fit any existing neoclassical models of utility, but was explained in 2001 when Caplin & Leahy (2001) incorporated "anticipatory feeling" into the neoclassical model, explaining that the students got some utility from anticipating the kiss with the celebrity (but also, as usual, discounted the utility of a reward the further away it was in the future), and this is why they didn't want the kiss right away.
Keep in mind that economists don't argue that we actually compute the expected utility of each option before us and then choose the best one, but that we always act "as if" we were doing that.9
But sometimes we don't even act "as if" we are obeying the axioms of neoclassical economics. For example, the independence axiom of expected utility theory says that if you prefer an apple over an orange, then you must prefer the Gamble A (72% chance you get an apple, otherwise you get a cat) over the Gamble B (72% chance you get an orange, otherwise you get a cat). But Allais (1953) found that subjects do violate this basic assumption under some conditions.
Such violations of the basic axioms of neoclassical economics led to the development of behavioral economics and theories like Kahneman and Tversky's (1979) prospect theory,10 which transcends some assumptions of the neoclassical model. But these new theories don't fit the data perfectly, either.11
The models of human motivation we've surveyed so far are conceptually related to decision theory (beliefs and desires, or probabilities and utilities), so I'll call them 'decision-theoretic models' of human motivation. We'll discuss decision-theoretic models again when we finally get to the topic of neuroscience, but for now I want to discuss a different approach to motivation.
Behaviorism and Reinforcement Learning
While neoclassical economists formulated expected utility theory, behaviorist psychologists developed a different set of explanations for human action. Though behaviorists were wrong when they said that science can't talk about mental activity or mental states, you can charitably think of behaviorists as playing a game of Rationalist's Taboo with constructs of folk psychology like "want" or "fear" in order to get at phenomena more appropriate for quantification in technical explanation. Also, the behaviorist approach led to 'reinforcement learning', an important concept in the neuroscience of human motivation.
Before I explain reinforcement learning, let's recall operant conditioning:
Behaviorism died in the wake of cognitive psychology, but its approach to motivation turned out to be very useful in the field of artificial intelligence, where it is called reinforcement learning:
In addition to the agent and its environment, there are four major components of a reinforcement learning system:
Want an example? Here is how a reinforcement learning agent would learn to play Tic-Tac-Toe:
A sequence of Tic-Tac-Toe moves might look like this:13
Solid lines are the moves our reinforcement learning agent made, and dotted lines are moves it considered but did not make. The second move was an exploratory move: it was taken even though another sibling move, that leading to e*, was ranked higher.
While playing, the agent changes the values assigned to the states it finds itself in. To improve its estimates concerning the probability of winning from various states, it 'backs up' the value of state after each greedy move to the state before the move (as suggested by the arrows.) What this means is that the value of the earlier state is adjusted to be closer to the value of the later state.
And that's how a simple version of temporal difference (TD) reinforcement learning works.
Reinforcement Learning and Decision Theory
You may have noticed a key advantage of reinforcement learning: an agent using it can be 'dumber' than a decision-theoretic agent. It can just start with guesses ("What the hell; let's try 50%!") for the value of various states, and then it learns their true values by running through many, many trials.
But what if you don't have many trials to run through, and you need to make an important decision right now?
Then you have to be smart. You need to have a good model of the world and use decision theory to choose the action with the highest expected utility.
This is precisely what rationality — being good at building correct models of the world — is especially good for:
Reinforcement learning can be a good strategy if you have time to learn from many trials. If you've only got one shot at a problem, you'd better build up a really accurate model of the world first and then try to maximize expected utility.
Now, back to our story.
It turns out that reinforcement learning seems to underlie many of our mental processes. (More on this later.)
The lesson Yvain drew from this discovery was:
But things are a bit more complicated than that, as we'll now see.
The Turn to the Brain
William Jevons (1871)
It turns out that Jevons was wrong. Modern neuroscience allows us to peer into the black box of the human value system and measure directly "the feelings of the human heart."14
We'll begin with the experiments of Wolfram Shultz. Schultz recorded the activity of single dopamine neurons in monkeys who sat in front of a water spout. At irregular intervals, a speaker played a tone and a drop of water dropped from the spout.15 The monkeys' dopamine neurons normally fired at the baseline rate, but responded with a burst of activity when water was delivered. Over time, though, the neurons responded less and less to the water and more and more to the tone.
But if Schultz delivered water without first giving the tone, then the dopamine neurons responded with a burst of activity again. And if he played the tone and didn't provide water, the neurons reduced their firing rates below the baseline. The neurons weren't responding to the water itself but to a difference between expected reward and actual reward — a reward prediction error (RPE).
Two other researchers, Read Montague and Peter Dayan, noticed that these patterns of neuronal activity were exactly predicted by TD reinforcement learning theory from computer science.16 In particular, the RPE observed in neurons appeared to play the same role in monkey learning as the difference between value estimates at two different times did in TD reinforcement learning theory.
Since then, researchers have done many more single-neuron recording studies to test particular versions of TD reinforcement learning and revise the theory until it predicts more and more behavior while also predicting novel experimental discoveries.
Caplin & Dean17 provided another way to test the hypothesis that dopamine neurons encoded RPE in a TD-class model. They showed that all existing RPE-models could be reduced to three axiomatic statements. If a system violated one of these axioms, it could not be an RPE system. Later, Caplin et al. (2010) tested the axioms on actual brain activity to see if they held up. They did. This is another reason why so many scientists working in this field believe the current 'dopamine hypothesis' — that dopamine neurons encode RPE in a TD-class reinforcement learning system in the brain.
TD-class reinforcement learning works in computers by updating numbers that represent the values of states. How does reinforcement learning work when using nerve cells?
Hebbian Learning
By Hebbian learning, of course. "Cells that fire together, wire together."
Imagine a neural pathway (in one of Pavlov's dogs) that connects the neural circuits that sense the ringing of a bell to the neural circuits for salivation. This is a weak connection at first, which is why the bell doesn't initially elicit salivation.
Also imagine a third neuron that connects the salivation circuit to a circuit that detects food. This is a strong connection, and that's why food does elicit salivation right away:18
Donald Hebb proposed:
In short, whenever two connected cells are active at the same time, the synapses connecting them are strengthened.
Consider Pavlov's experiment. At first, the Bell cell will fire whenever bells ring, but probably not when the salivation cells happen to be active. So, the connection between the Bell cell and the Salivation cell remains weak. But then, Pavlov intervenes and causes the Bell cell and the Salivation cell to fire at the same time by ringing the bell and presenting food at the same time (the Food detector cell already has a strong connection to the Salivation cell). Whenever the Bell cell and the Salivation cell happen to fire at the same time, the synapse between them is strengthened. Once the connection is strong enough, the Bell cell can cause the Salivation cell to fire on its own, just like the Food detector cell can.
It was a fine theory, but it wasn't observed until Bliss & Lomo (1973) observed Hebb's mechanism at work in the rabbit hippocampus. Today, we know how some forms of Hebb's mechanism work at the molecular level.20
Later, Wickens (1993) proposed a similar mechanism called the three-factor rule, according to which some synapses are strengthened whenever presynaptic and postsynaptic activity occurred in the presence of dopamine. These same synapses might be weakened when activity occurred in the absence of dopamine. Later studies confirmed this hypothesis.21
Suppose a monkey receives an unexpected reward and encodes a large positive RPE. Glimcher explains:
We will return to the dopamine system later, but for now let us back up and pursue the neoclassical economic path into the brain.
Expected Utility in Neurons
Ever since Friedman (1953), economists have insisted that humans only behave as if they are utility maximizers, not that they actually compute expected utility and try to maximize it.
It was a surprise, then, when neuroscientists stumbled upon the neurons that were encoding expected utility in their firing rates.
Tanji & Evarts (1976) did their experiments with rhesus monkeys because they are our closest relative besides the apes, and this kind of work is usually forbidden on apes for ethical reasons (we need to implant a recording electrode in the brain).
The monkeys were trained to know that a colored light on the screen meant they would soon be offered a reward (a drop of water) either for pushing or pulling, but not for both. This was the ‘ready’ cue. A second later, researchers gave a ‘direction’ cue that told the monkeys which action — push or pull — was going to be rewarded. The third cue was the 'go' signal: if the monkey made the previously indicated movement, it was rewarded.
This is what they saw:
At the ‘ready’ cue, the neurons associated with a pushing motion became weakly active (but fired above the baseline rate), and so did the neurons associated with a pulling motion. When the ‘direction’ cue was given, the neurons associated with the to-be-rewarded motion doubled their firing rate, and the neurons associated with the opposite motion fell back to the baseline rate. Then at the ‘go’ cue, the neurons associated with the to-be-rewarded movement increased again rapidly, up past the threshhold required to produce movement, and the movement was produced shortly thereafter.
One tempting explanation of the data is that after the ‘ready’ cue, the monkey’s brain 'decides' there’s a 50% chance that pulling will get the reward, and a 50% chance that pushing will get the reward. That’s why we see the neuron firing rates associated with those two actions each jump to slightly less than 50% of the movement threshold when the ‘ready’ cue is given. But then, when the ‘direction’ cue is given, those expectations shift to 100%/0% or 0%/100%, depending on which action is about to be rewarded according to the ‘direction’ cue. That’s why activity in the circuit associated with the to-be-rewarded action doubles and the other one drops to baseline. And then the ‘go’ cue is delivered and firing rates blast past the movement threshold, and movement is produced.
Let's jump ahead to Basso & Wurtz (1997), who did a similar experiment except that they used voluntary eye movements (called ‘saccades’) instead of voluntary arm movements. And this time, they presented each monkey with one, two, four, or eight possible targets, instead of just two targets (push and pull) like Tanji & Evarts did.
What they found was that as more potential targets were presented, the magnitude of the preparatory activity associated with each target systematically decreased. And again, once the ‘direction’ and ‘go’ cues were presented, the activity associated with those other potential targets dropped rapidly and activity burst rapidly in neurons associated with the to-be-rewarded movement. It was as though the monkeys’ brains were distributing their probability mass evenly across the potentially rewarded actions, and then once they knew which action should in fact be rewarded, they moved all their probability mass to that action and performed the action and got the reward.
Real-Time Expected Utility Updates
Other researchers showed monkeys a black screen with flickering white dots on it. In each frame of the video, the computer moved each dot in a random direction. The independent variable was a measure called 'coherence.' In a 100% leftward coherence condition, all dots moved to the left. In a 60% rightward condition, 60% of the dots move rightward while the rest moved randomly. And so on.
In a typical experiment, the researchers would identify a neuron in a monkey's brain that increased its firing rate in response to rightward coherence of the dots, and decreased its firing rate in response to leftward coherent of the dots. Then they would present the monkey with a sequence (in random order) of every possible leftward and rightward coherence condition.
A leftward coherence (of any magnitude) meant the monkey would be rewarded for leftward eye movement, and a rightward coherence meant the monkey would be rewarded for rightward eye movement. But, the monkey had to wait two seconds before being rewarded.
In this experiment, the probabilities always started at 50% but then updated continuously. A 100% rightward coherence condition allowed the monkey to very quickly know which voluntary eye movement would be rewarded, but in a 5% rightward coherence condition the expected utility of the rightward target grew more slowly.
The results? The greater the coherence of rightward motion of the dots, the faster the neurons associated with rightward eye movement increased their firing rate. (A higher coherence meant the monkey was able to update its probabilities more quickly.)
Argmax and Reservation Price
Many studies show that the brain controls movement by way of a 'winner take all' mechanism that is isomorphic to the argmax operation from economics.23 That is, there are many possibilities competing for your final choice, but just before your choice the single strongest signal remains after all the others are inhibited.
This choice mechanism was investigated in more detail by Michael Shadlen and others.24 Shadlen gave monkeys the same eye movement task as above, except that the monkeys could make their choice at any time instead of waiting for two seconds. He found that:
The threshold level acts as a kind of criterion of choice. Once the criterion is met, action is taken. Or in economic terms, the monkeys seemed to set a reservation price on making certain movements.25
Random Utility
When deciding between goods of different expected utilities, humans exhibit a stochastic transfer function:
Our behavior has an element of randomness in it. Daniel McFadden won a Nobel Prize in economics for capturing such behavior using a random utility model.27 The way he did it is to suppose that when a chooser asks himself what a thing is worth, he doesn't get a fixed answer but a variable one. That is, there is actual variation in his preferences. Thus, his expected utility for a particular lottery is drawn from a distribution of possible utilities, usually one with a Gaussian variance.28
This behavior makes sense when we think about the human choice mechanism at the neuronal level, because neuron firing rates are stochastic.29 When a neurobiologist says "The neuron was firing at 200 Hz," what she means is that the mean firing rate of the neuron over a long time and stable conditions would have been close to 200 Hz. So the neurons that encode utility (wherever they are) will exhibit stochasticity, and thereby introduce some randomness into our choices. In this way, neurobiological data constrains our economic models of human behavior. An economic model without some randomness in it will have difficulty capturing human choices for as long as humans run on neurons.30
Discounting
Louie & Glimcher (2010) examined temporal discounting in the brain. The two monkeys in this study were repeatedly asked to choose between a small, immediately available reward and a larger reward available after a small delay. For example, on one day they were asked to choose between 0.13 millileters of juice right now, or else 0.2 millileters of juice available after a delay of 2, 4, 8, or 12 seconds. A monkey might be willing to wait 2, 4, or 8 seconds for the larger reward, but not 12 seconds.
After many, many measurements of this kind, Louie and Glimcher were able to describe the discounting function being used by each monkey. (One of them was more impatient than the other.)
Moreover, the neurons in the relevant section of the brain fired at rates that reflected each monkey’s discounting function. If 0.2 millileters of juice was offered with no delay, the neurons were highly active. If the same reward was offered at a delay of 2 seconds, they were slightly less active. If the same reward was offered after 4 seconds, the neurons were less active still. And so on. As it turned out, the discounting function that captured their choices was identical to the discounting function that captured the firing rates of these neurons.
This shouldn't be a surprise at this point, but just to confirm: Yes, we can observe discounting in the firing rates of neurons involved in the choice-making process.
Relative and Absolute Utility
Dorris & Glimcher (2004) observed monkeys and their choice mechanism neurons while the monkeys engaged in repeated plays of the inspection game. The study is too involved for me to explain here, but the results suggested that choice mechanism neurons encode relative expected utilities (relative to other actions under consideration) rather than absolute expected utilities.
Tobler et al. (2005) suggested that the brain only encodes relative expected utilities. But there is reason to suspect this can't be right. If we stored only relative expected utilities, then we would routinely violate the axiom of transitivity (if you prefer A to B and B to C, you can't also prefer C to A). To see why this is the case, consider Glimcher's example (he says 'expected value' instead of 'utility'):
And we mostly do seem to obey the axiom of transitivity.
So if the choice mechanism neurons do represent relative utilities, then some other neurons elsewhere must encode a more absolute form of utility. Other implications of this are explored in the next section.
Normalization
David Heeger showed32 that the firing rates of 'feature detector' neurons in the visual cortex captured a response to a feature in the visual field divided by the sum of the activation rates of nearby neurons sensitive to the same image. Thus, these neurons encode not only whether they 'see' the feature they are built to detect, but also how unique it is in the visual field.
The effect of this is that neurons reacting to the edge of a visual object fire more actively than others do. Behold! Edge detection!
It's also an efficient way to encode information about the world. Consider a world where orange dots are ubiquitous. For an animal in that world, it would be wasteful to fire action potentials to represent orange dots. Better to represent the absence of orange dots, or the transition from orange dots to something else. An optimally efficient encoding method would be sensitive not to the 'alphabet' of all possible inputs, but to a smaller alphabet of the inputs that actually appear in the world. This insight was mathematically formalized by Schwartz & Simoncelli (2001).
The efficiency of this normalization technique may explain why we've discovered it at work in so many different places in the brain.33 And given that we've found it almost everywhere we've looked for it, it wouldn't be a surprise to see it show up in our choice-making circuits. Indeed, Simoncelli & Schwartz's normalization equation may be what our brains use to encode expected utilities that are relative to the other choices under consideration.
One implication of their equation is that a chooser's errors become more frequent as the size of the choice set grows. Thus, behavioral errors on small choice sets should be rarer than might be predicted by most random utility models, but error rates will increase rapidly with choice set size (and beyond a certain choice set size, choices will appear random).
Preliminary evidence that choice set size effects error rates has arrived from behavioral economics. For example, consider Iyengar & Lepper's (2000) study of supermarket shoppers. They set up a table showing either 6 or 24 flavors of jams, allowing shoppers to sample as many as they wanted. Customers who saw 24 flavors had a 3% chance of buying a jar, while those who saw only 6 flavors had a 30% chance!
In another experiment, Iyengar & Lepper let subjects choose one of either 6 or 30 different chocolates. Those who chose from among only 6 options were more satisfied with their selection than those who had been presented with 30 different chocolates.
These data fit our expectation that as the choice set grows, the frequency of errors in our behavior rises and the likelihood that an option will rise above the threshold for purchase drops. When Louie & Glimcher (2010) investigated this phenomena in monkey choice mechanism neurons, they found it at work there, too. But the process of choice-set editing is still poorly understood, and some recent studies have failed to replicate Iyengar & Lepper's results (Scheibehenne et al. 2010).
Perhaps the most surprising implication of these findings is that because of neuronal stochasticity, and because errors increase as the choice set grows, we should expect stochastic violations of the independence axiom, and that when choosers face very large choice sets they will essentially ignore the independence axiom.
This is a prediction about human behavior not made by earlier models from neoclassical economics, but it is suggested by looking at the neurons involved in human choice-making.
Are Actions Choices?
But all these data come from experiments where the choices are actions, and from our knowledge of the brain's "final common path" for producing actions. How do actions map on to choices about lovers and smartphones?
Studies by Greg Horowitz have provided some relevant data, because monkeys had to choose options identified by color rather than by action.34 For example in one trial, a 'red' option might offer one reward and a 'green' option might offer a different reward. On each trial, the red and green options would appear at random places on the computer screen, and the monkey could choose a reward with a voluntary eye movement. The key here is that rewards were chosen by color and not by a (particular) action.
Horowitz found that the choice mechanism neurons showed the same pattern of activation under these conditions as was the case under action-based choice tasks.
So, it looks like the valuation circuits can store the value of a colored target, and these valuations can be mapped to the choice mechanism. But we don't know much about how this works, yet.
The Primate Choice Mechanism: A Brief Review
Thus far, we have mostly discussed the primate brain's choice mechanism. To review:
But how are probability and utility calculated such that they can be fed into the expected utility representations of the choice mechanism? I won't discuss how the brain forms probabilistic beliefs in this article,36 so let us turn to the study of how utility is calculated in the brain: the question of valuation.
Marginal Utility and Reference Dependence
Consider the following story:
Economists have known this problem for a long time, and solved it with an idea called marginal utility.
In neoclassical economics, we view the animal as having two kinds of 'wealth': a sugar wealth and a water wealth (the total store of sugar and water in the animal's body at a given time). A piece of fruit or a sip of water is an increment in the animal's total sugar or water wealth. The utility of a piece of fruit or a sip of water, then, depends on its current levels of sugar and water wealth.
On day one, the animal's need for water is low and its need for sugar is high. On that day, the marginal utility of a piece of fruit is greater than the marginal utility of a sip of water. But suppose during the next week the animal has a high blood sugar level. At that time, the marginal utility of a piece of fruit is low. Thus, the marginal utility of a consumable resource depends on wealth. The wealthier the chooser, the lower the marginal utility provided by a fixed amount of gain ('diminishing marginal utility').
In neoclassical economics, the animal faced with the option of going east or west in the morning would first estimate how much the water and the fruit would change its objective wealth level, and then it would estimate how much those objective changes in wealth would change its utility. That is, it would use objective values to compute its marginal (subjective) utility. If it only had access to the subjective experiences in our story, it couldn't compute a new marginal utilities when it finds itself unexpectedly thirsty.
The problem with this solution is that the brain does not appear to encode the objective values of stimuli, and humans behaviorally don't seem to respect the objective values of options either, as discussed here.
In response to the behavioral evidence, Kahneman & Tversky (1979) developed a reference dependent utility function to describe human behavior: prospect theory. Their suggestion was, basically:
This fits with the neurobiological fact that we encode signals from external stimuli relative to reference points, and don't have access to the objective values of stimuli.
The advantage of the neoclassical economic model is that it keeps a chooser's choices consistent. The advantage of the reference-dependent approach is that it better fits human behavior and human neurobiology.
Most neoclassical economists seem to ignore the problems for their theories that are presented by reference dependence in human behavior and human neurobiology, but two neoclassical economists at Berkeley, Matthew Rabin and Botond Koszegi, have begun to take reference dependence seriously. As they put it:
Their reference-dependent model makes particular predictions:
Thus, the cost of accepting the human fact of reference-dependence is that we have to admit that humans are irrational (in the sense of 'rationality' defined by the axioms of revealed preference):
But really, shouldn't it have been obvious all along that humans are irrational? Perhaps it is, to everyone but neoclassical economists and Aristoteleans. (Okay, enough teasing...)
One thing to keep in mind is that the brain encodes information about the external world in a reference-dependent way because that method makes a more efficient use of neurons. So evolution traded away some rationality for greater efficiency in the encoding mechanism.
Valuation in the Brain
Back to dopamine. Earlier, we learned that the brain learns the values of their actions with a dopaminergic reward system that uses something like temporal difference (TD) reinforcement learning. This reward system updates the stored values for actions by generating a reward prediction error (RPE) from the difference between expected reward and experience reward, and propagating this learning throughout relevant structures of the brain using the neurotransmitter dopamine. In particular, some synapses are strengthened whenever presynaptic and postsynaptic activity occur in the presence of dopamine, as proposed by Wickens (1993).
But we haven't yet discussed how utilities for actions are generated in the first place, or how they are stored (independent of the expected utilities represented during the choice process). It feels like I generally want ice cream a little bit and hot sex a lot more. Where is that information stored?
Dozens41 of fMRI studies show that two brain regions in particular are correlated with subjective value: the ventral striatum and the medial prefrontal cortex. Other studies suggest that at least five more brain regions probably also contribute to the valuation process: the orbitofrontal cortex, the dorsolateral prefrontal cortex, the amygdala, the insula, and the anterior cingulate cortex.
There are many theories about how the human brain generates and stores utilities, but these theories are far more speculative and in their infancy than everything else I've presented in this tutorial, so I won't discuss them here. Instead, let us conclude with a summary of what neuroscientists know about the human brain's motivational system, and what some of the greatest open questions are.
Summary and Research Directions
Here's what we've learned:
Paul Glimcher lists42 the greatest open questions in the field as:
Later, we'll explore the implications of our findings for metaethics. As of August 2011, if you've read this then you probably know more about how human values actually work than almost every professional metaethicist on Earth. The general lesson here is that you can often out-pace most philosophers simply by reading what today's leading scientists have to say about a given topic instead of reading what philosophers say about it.
Notes
1 They are: Less Wrong Rationality and Mainstream Philosophy, Philosophy: A Diseased Discipline, On Being Okay with the Truth, The Neuroscience of Pleasure, The Neuroscience of Desire, How You Make Judgments: The Elephant and its Rider, Being Wrong About Your Own Subjective Experience, Intuition and Unconscious Learning, Inferring Our Desires, Wrong About Our Own Desires, Do Humans Want Things?, Not for the Sake of Pleasure Alone, Not for the Sake of Selfishness Alone, Your Evolved Intuitions, When Intuitions Are Useful, Cornell Realism, Railton's Moral Reductionism (Part 1), Railton's Moral Reductionism (Part 2), Jackson's Moral Functionalism, Moral Reductionism and Moore's Open Question Argument, and Are Deontological Moral Judgments Rationalizations?
2 Heading Toward: No-Nonsense Metaethics, What is Metaethics?, Conceptual Analysis and Moral Theory, and Pluralistic Moral Reductionism.
3 I tried something similar before, with Cognitive Science in One Lesson.
4 Glimcher (2010) offers the best coverage of the topic in a single book. Tobler & Kobayashi (2009) offer the best coverage in a single article.
5 The quotes in this section are from Churchland (1981).
6 Allen & Ng (2004).
7 This perspective goes back at least as far back as Arnauld (1662), who wrote:
8 In addition to Caplin & Leahy (2001), see Kreps & Porteus' (1978, 1979) incroporation of the "utility of knowing", Loomes & Sugden's (1982) incorporation of "regret", Gul & Pesendorfer's (2001) incorporation of "the cost of self-control", and Koszegi & Rabin's (2007, 2009) incorporation of the "reference point".
9 Friedman (1953).
10 See a review in Fox & Poldrack (2009).
11 For one difficulty with prospect theory, see Laury & Holt (2008).
12 Sutton & Barto (2008), p. 3. All quotes from this section are from the early pages of this book.
13 From Sutton & Barto (2008).
14 Much of the rest of this post is basically a summary and paraphrase of Glimcher (2010).
15 Mirenowicz & Schultz (1994).
16 Schultz et al. (1997).
17 Caplin & Dean (2007)
18 From Glimcher (2010).
19 Hebb (1949).
20 Malenka & Bear (2004).
21 Reynolds & Wickens (2002).
22 Glimcher (2010), p. 341.
23 Edelman & Keller (1996); Van Gisbergen et al. (1987).
24 Gold and Shadlen (2007); Roitman and Shadlen (2002).
25 Simon (1957).
26 Glimcher (2010), p. 215.
27 McFadden (2000). The behavior of gradually transitioning between two choices is described by Selten (1975).
28 For a probably improved random utility model, see Gul & Pesendorfer (2006).
29 Dean (1983); Werner & Mountcastle (1963).
30 Unless some other feature of the brain turns out to 'smooth out' the stochasticity of neurons involved in valuation and choice-making.
31 Glimcher (2010).
32 Heeger (1992, 1993); Carandini & Heeger (1994); Simoncelli & Heeger (1998).
33 Carandini & Heeger (1994); Britten & Heuer (1999); Zoccolan et al. (2005); Louie & Glimcher (2010).
34 Horowitz & Newsome (2001a, 2001b, 2004).
35 Liu & Wang (2008).
36 But, see Deneve (2009).
37 Glimcher (2010), p. 281.
38 Glimcher (2010), p. 283.
39 This quote and the next quote are from Koszegi & Rabin (2006).
40 Glimcher (2010), p. 292.
41 I won't list them all here. For an overview, see Glimcher (2010), ch. 14.
42 Glimcher (2010), ch. 17. I've paraphrased his open questions. I also excluded his 6th question: What Is the Neural Organ for Representing Money?
References
Allais (1953). Le comportement de l'homme rationel devant le risque. Critique des postulates et axiomes de l'ecole americaine. Econometrica, 21: 503-546.
Allen & Ng (2004). Economic behavior. In Spielberger (ed.), Encyclopedia of Applied Psychology, Vol. 1 (pp. 661-666). Academic Press.
Arnauld (1662). Port-Royal Logic.
Basso & Wurtz (1997). Modulation of neuronal activity in superior colliculus by changes in target probability. Journal of Neuroscience, 18: 7519-7534.
Britten & Heuer (1999). Spatial summation in the receptive fields of MT neurons. Journal of Neuroscience, 19: 5074-5084.
Caplin & Dean (2007). Axiomatic neuroeconomics.
Caplin, Dean, Glimcher, & Rutledge (2010). Measuring beliefs and rewards: a neuroeconomic approach. Quarterly Journal of Economics, 125: 3.
Caplin & Leahy (2001). Psychological expected utility theory and anticipatory feelings. Quarterly Journal of Economics, 116: 55-79.
Carandini & Heeger (1994). Summation and devision by neurons in primate visual cortex. Science, 264: 1333-1336.
Churchland (1981). Eliminative materialism and the propositional attitudes. The Journal of Philosophy, 78: 67-90.
Dean (1983). Adaptation-induced alteration of the relation between response amplitude and contrast in cat striate cortical neurons. Vision Research, 23: 249-256.
Deneve (2009). Bayesian decision making in two-alternative forced choices. In Dreher & Tremblay (eds.), Handbook of Reward and Decision Making (pp. 441-458). Academic Press.
Dorris & Glimcher (2004). Activity in posterior parietal cortex is correlated with the subjective desireability of an action. Neuron, 44: 365-378.
Edelman & Keller (1996). Activity of visuomotor burst neurons in the superior colliculus accompanying express saccades. Journal of Neurophysiology, 76: 908-926.
Fox & Poldrack (2009). Prospect theory and the brain. In Glimcher, Camerer, Fehr, & Poldrack (eds.), Neuroeconomics: Decision Making and the Brain (pp. 145-173). Academic Press.
Friedman (1953). Essays in Positive Economics. University of Chicago Press.
Glimcher (2010). Foundations of Neuroeconomic Analysis. Oxford University Press.
Gold and Shadlen (2007). The neural basis of decision making. Annual Review of Neuroscience, 30: 535-574.
Gul & Pesendorfer (2001). Temptation and self-control. Econometrica, 69: 1403-1435.
Gul & Pesendorfer (2006). Random expected utility. Econometrica, 74: 121-146.
Hebb (1949). The organization of behavior. Wiley & Sons.
Heeger (1992). Normalization of cell responses in cat striate cortex. Visual Neuroscience, 9: 181-197.
Heeger (1993). Modeling simple-cell direction selectivity with normalized, half-squared linear operators. Journal of Neurophysiology, 70: 1885-1898.
Horowitz & Newsome (2001a). Target selection for saccadic eye movements: direction selective visual responses in the superior colliculus induced by behavioral training. Journal of Neurophysiology, 86: 2527-2542.
Horowitz & Newsome (2001b). Target selection for saccadic eye movements: prelude activity in the superior colliculus during a direction discrimination task. Journal of Neurophysiology, 86: 2543-2558.
Iyengar & Lepper (2000). When choice is demotivating: Can one desire too much of a good thing? Journal of Personality and Social Psychology, 79: 995-1006.
Jevons (1871). The Theory of Political Economy. Macmillan and Co.
Kahneman & Tversky (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47: 263-291.
Koszegi & Rabin (2006). A model of reference-dependent preferences. Quarterly Journal of Economics, 121: 1133-1165.
Koszegi & Rabin (2007). Reference-dependent risk attitudes. American Economic Review, 97: 1047-1073.
Koszegi & Rabin (2009). Reference-dependent consumption plans. American Economic Review, 99: 909-936.
Kreps & Porteus (1978). Temporal resolution of uncertainty and dynamic choice theory. Econometrica, 46: 185-200.
Kreps & Porteus (1979). Dynamic choice theory and dynamic programming. Econometrica, 47: 91-100.
Laury & Holt (2008). Payoff scale effects and risk preference under real and hypothetical conditions. In Plott & Smith (eds.), Handbook of Experimental Economic Results, Vol. 1 (pp. 1047-1053). Elsevier Press.
Loewenstein (1987). Anticipation and the valuation of delayed consumption. Economic Journal, 97: 666-684.
Liu & Wang (2008). A common cortical circuit mechanism for perceptual categorical discrimination and veridical judgment. PLOS Computational Biology, 4: 1-14.
Loomes & Sugden (1982). Regret theory: An alternative theory of rational choice under uncertainty. Economic Journal, 92: 805-824.
Louie & Glimcher (2010). Separating value from choice: delay discounting activity in the lateral intraparietal area. Journal of Neuroscience, 30: 5498-5507.
Malenka & Bear (2004). LTP and LTD: an embarrassment of riches. Neuron, 44: 5–21.
Mirenowicz & Schultz (1994). Importance of unpredictability for reward responses in primate dopamine neurons. Journal of Neurophysiology, 72: 1024-1027.
Reynolds & Wickens (2002). Dopamine-dependent plasticity of corticostriatal synapses. Neural Networks, 15: 507-521.
Roitman and Shadlen (2002). Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. Nature Neuroscience, 22: 9475-9489.
Scheibehenne, Greifeneder, & Todd (2010). Can there ever be too many options? A meta-analytic review of choice overload. Journal of Consumer Research, 37: 409-425.
Schultz, Dayan, & Montague (1997). A neural substrate of prediction and reward. Science, 275: 1593–1599.
Schwartz & Simoncelli (2001). Natural signal statistics and sensory gain control. Nature Neuroscience, 4: 819-825.
Selten (1975). Reexamination of perfectness concept for equilibrium points in extensive games. International Journal of Game Theory, 4: 25-55.
Simoncelli & Heeger (1998). A model of neuronal responses in visual area MT. Vision Research, 38: 743-761.
Sutton & Barto (2008). Reinforcement Learning: An Introduction. MIT Press.
Tanji & Evarts (1976). Anticipatory activity of motor cortex neurons in relation to direction of an intended movement. Journal of Neurophysiology, 39: 1062-1068.
Tobler & Kobayashi (2009). Electrophysiological correlates of reward processing in dopamine neurons. In Dreher & Tremblay (eds.), Handbook of Reward and Decision Making (pp. 29-50). Academic Press.
Van Gisbergen, Opstal, & Tax (1987). Collicular ensemble coding of saccades based on vector summation. Neuroscience, 21: 651.
Werner & Mountcastle (1963). The variability of central neural activity in a sensory system, and its implications for central reflection of sensory events. Journal of Neurophysiology, 26: 958-977.
Wickens (1993). A Theory of the Striatum. Pergamon Press.
Zoccolan, Cox, & DiCarlo (2005). Multiple object response normalization in monkey inferotemporal cortex. Journal of Neuroscience, 25: 8150-8164.