I've begun to notice a pattern with experiments in behavioral economics. An experiment produces a result that's counter-intuitive and surprising, and demonstrates that people don't behave as rationally as expected. Then, as time passes, other researchers contrive different versions of the experiment that show the experiment may not have been about what we thought it was about in the first place. For example, in the dictator game, Jeffrey Winking and Nicholas Mizer changed the experiment so that the participants didn't know each other and the subjects didn't know they were in an experiment. With this simple adjustment that made the conditions of the game more realistic, the "dictators" switched from giving away a large portion of their unearned gains to giving away nothing. Now it's happened to the marshmallow test.

In the original Stanford marshmallow experiment, children were given one marshmallow. They could eat the marshmallow right away; or, if they waited fifteen minutes for the experimenter to return without eating the marshmallow, they'd get a second marshmallow. Even more interestingly, in follow-up studies two decades later, the children who waited longer for the second marshmallow, i.e. showed delayed gratification, had higher SAT scores, school performance, and even improved Body Mass Index. This is normally interpreted as indicating the importance of self-control and delayed gratification for life success.

Not so fast.

In a new variant of the experiment entitled (I kid you not) "Rational snacking", Celeste Kidd, Holly Palmeri, and Richard N. Aslin from the University of Rochester gave the children a similar test with an interesting twist.

They assigned 28 children to two groups asked to perform art projects. Children in the first group each received half a container of used crayons, and were told that if they could wait, the researcher would bring them more and better art supplies. However, after two and a half minutes, the adult returned and told the child they had made a mistake, and there were no more art supplies so they'd have to use the original crayons.

In part 2, the adult gave the child a single sticker and told the child that if they waited, the adult would bring them more stickers to use. Again the adult reneged.

Children in the second group went through the same routine except this time the adult fulfilled their promises, bringing the children more and better art supplies and several large stickers.

After these two events, the experimenters repeated the classic marshmallow test with both groups. The results demonstrated children were a lot more rational than we might have thought. Of the 14 children in group 1, who had been shown that the experimenters were unreliable adults, 13 of them ate the first marshmallow. 8 of the 14 children in the reliable adult group, waited out the fifteen minutes. On average children in unreliable group 1 waited only 3 minutes, and those in reliable group 2 waited 12 minutes.

So maybe what the longitudinal studies show is that children who come from an environment where they have learned to be more trusting have better life outcomes. I make absolutely no claims as to which direction the arrow of causality may run, or whether it's pure correlation with other factors. For instance, maybe breastfeeding increases both trust and academic performance. But any way you interpret these results, the case for the importance and even the existence of innate self-control is looking a lot weaker.

New to LessWrong?

New Comment
25 comments, sorted by Click to highlight new comments since: Today at 3:48 AM

That's not a pattern confined to behavioral economics, it's expected to apply to all of science.

The technical term for this idea is "underdetermination of scientific theory by evidence" - you don't make a discovery by running a single experiment, that's mythologized science. Real science builds up by doing an experiment, abducting to a possible explanation, then painstakingly ruling out all the alternatives (or, very often, the original potential explanation) with further experiments, often taking a very long time. (Even this is a drastically oversimplified view of the "scientific process" but it'll have to do for this discussion.)

If there is an issue here, it's not so much in how scientists are doing their jobs (though in this specific case I'll admit it seems to be taking a bit long to refine the original result), it's more one of sensationalistic journalistic reporting of scientific results and uncritical acceptance by the lay public, often leading to herd behavior.

I really don't see how this casts doubt on the original experiment. Suppose we express a child's decision as maximizing expected reward minus the cost of waiting, where the latter takes "self control" as a parameter. If we lower expected reward, (nearly) all the kids eat the marshmallow. If we raise expected reward (by reinforcing waiting twice), about half the kids wait. But still, 6/14 kids in the second group didn't wait, so clearly there's variance from another source.

One way to tease out this connection might be to compare the kids who waited to the kids who tried to hold out and ate the marshmallow late (say after 10 minutes). Presumably the latter group trusted the adults, and their failure to wait was due to lack of self control. Now compare those two groups 10 years later.

I read the experiment with adults who renege on their promises some time ago, and my reaction was along the lines of "seriously, the kids would have to be idiots to take them at their words after all this."

There's no point in engaging one's ability to delay gratification for a reward that almost certainly isn't coming.

For some value of "almost certainly". If you value one marshmallow fifteen minutes from now 0.95 times as much as one right now, and you're 90% sure you won't get a second marshmallow, you still are better off in average by waiting.

I really don't see how this casts doubt on the original experiment. Suppose we express a child's decision as maximizing expected reward minus the cost of waiting, where the latter takes "self control" as a parameter. If we lower expected reward, (nearly) all the kids eat the marshmallow. If we raise expected reward (by reinforcing waiting twice), about half the kids wait. But still, 6/14 kids in the second group didn't wait, so clearly there's variance from another source.

The other source of variance could still be the children's "trustingness." The more trusting children could have a higher expected reward even after the kids are shown that the adults are reliable/unreliable. So the results are consistent with both of the following hypotheses:

  • More trusting children will wait longer and self control is not relevant
  • More trusting children will wait longer and children with more self control will wait longer

But this experiment ruled out the following:

  • It doesn't matter if a child is more trusting; only self control affects how long they wait

But this experiment ruled out the following:

  • It doesn't matter if a child is more trusting; only self control affects how long they wait

I agree, but I don't think anyone believed that nothing else matters to marshmallow eating.

We have to distinguish between the propositions:

(P1) A significant fraction of the variance in marshmallow eating among children observed in past experiments is explained by trustingness.

(P2) Inducing large changes in trustingness in children produces changes in marshmallow eating behavior.

This study supports (P2), but it is only informative about (P1) to the extent someone previously assigned substantial probability mass to the proposition:

(P3) There is large variation in childrens' trustingness, but trustingness doesn't affect childrens' marshmallow eating decision.

I suspect most people didn't assign much probability to (P3), and so this study shouldn't change their opinion very much.

Agreed, but I think the reason this experiment is interesting is that it previously didn't occur to people (or at least to me) that trustingness is a possible alternative explanation of the classic marshmallow experiment, rather than self control. It was a blind spot.

It didn't occur to me either, but ironically it was the first thing my wife suggested when I told her about the marshmallow experiment yesterday (it came up in the context of that professor's comments about fat people, self control, and PhD programs recently). This post's timing was thus quite serendipitous.

It probably occurred to her because she is a doctor who works with primarily poor patients, many of whom are black and hispanic, and so is used to the associated mistrust when crossing cultural and socio-economic lines.

I'm not so trusting, and it occurred to me.

EDIT: The key is to focus on what the experimental subjects observe, not the rules the experimenters intend to follow. The kid is promised another marshmallow, he doesn't know that he is going to get one, and he doesn't know that the one he has now won't be taken away. With priors associated with abuse, the kid should eat the marshmallow as soon as possible.

Hmmm, pretty sure this is not a new interpretation, as I could have sworn discussing this alternative on this list some time not so long ago. Of course, I don't have a lab, budget, or grad students to test the theory empirically.

Trusting kids probably start from a more nurturing environment and so tend toward better results, while being trusting in a safe environment opens possibilities that being paranoid forecloses.

I imagine the kids who didn't wait for the 2nd marshmellow in the original experiment had already been tricked numerous times by authority figures (parents, preachers, teachers, etc) & so had been pre-conditioned to expect deception. It doesn't surprise me at all that children with unreliable authority figures end up performing worse as adults. The ACE (Adverse childhood experiences) study shows that mistreatment as a kid correlates with social, emotional, & cognitive impairment, increases risk for adoption of health-risk behaviors, and increases risk for disease, disability, & social problems.

The surprise level (and so the information content) of this new study is very low. "Fool me once..." and such. And I cannot imagine whether there is anything interesting one can potentially learn from it even after tracking the participants two decades later. Maybe that the trusting suckers in the first group continue to get conned throughout their lives?

Maybe that the trusting suckers in the first group continue to get conned throughout their lives?

It's the trusting suckers who have better life outcomes in the classic experiment, so that would be surprising. But I look at the new study as support for a vase/faces style reframing of the original as "not actually about willpower," rather than as something which is supposed to be surprising on its own.

[-][anonymous]11y40

For instance, maybe breastfeeding increases both trust and academic performance.

This seems like a non-sequitor.

I am not claiming that breastfeeding does increase both trust and academic performance. I am simply pointing out that it is not hard to imagine a characteristic that improves both trust and life outcome, without trust improving life outcome or life outcome improving trust. I could equally well have substituted family size, parental education, the presence of pets in the home, or any other characteristic for breast feeding. My apologies if that wasn't clear.

Until now the claim has been that the marshmallow experiment and its followup show higher self-control leads to preferable life outcomes. This study casts a lot of doubt on the causality of those conclusions. What's significant here is that it shows that "self-control" is itself a dependent variable of other causes. Once that's shown, what looked like causation is no longer so obviously causal.

[-][anonymous]11y20

Thanks for clarifying, that removes my worry.

They are referencing studies which show that breast feeding is correlated with heightened academic performance later in life.

[-][anonymous]11y20

Hmm, I think 'show' is too strong a word there. Near as I can tell, those results vanish when socio-economic status is controlled for. I was just wondering what this had to do with trust.

(As I said, correlated), although note that at this point randomized studies have been done with some women being encouraged to breast feed and others not given any encouragement. There's still a correlation. See this summary. This strongly suggests that what is going on here is not purely socioeconomic. But it is easy to connect this to a trust hypothesis: a child which is being directly exposed to their mother's body might form strong social connections, hence be more trusting.

[-][anonymous]11y00

But it is easy to connect this to a trust hypothesis: a child which is being directly exposed to their mother's body might form strong social connections, hence be more trusting.

Ah, thanks. That seems like an ambitious empirical claim to me, but at least I see the connection.

I don't think the purpose of the breast feeding example was to say that it had a high probability of being the right explanation; just that there is a lot of uncovered ground in hypothesis space.

There's a bloggingheads episode on the marshmallow experiment, and its variations, here.

I sure hope that the 'unreliable adults' kids group got a bunch of stuff at the end as apology for yanking their chains, and an explanation. Otherwise this is just contributing to the problem the study is trying to point out.

The Dictator game control is perfectly valid as a control for the Ultimatum game, since they're both equally lab experiments.

But it's a good antidote to for anyone who'd concluded "hey, people [in this population] are pretty generous" from Dictator splits (unlike Ultimatum, the offer can't be refused, so a rational agent should take 100% for himself - and people do, when they don't think they're in a lab experiment).

The Marshmallow test is observational study, thus we can't conclude anything from here. This is very important to pin point out, but I don't know why nobody is doing it. In observational study, participants are not randomly assigned to treatment, and have a space for confounding variables. We can just bring up wealth, former experience, parenting, personality, etc to try to explain the test, while no one is sure which one is the most influential factor. We can only tell "there is an association between A and B" in the observational study.

But in experiment, drawing conclusion and generalization to bigger population are possible with random assignment, which balances out confounding variables and leave only 2 variables in the influence. If researchers run experiment for the marshmallow test, researchers would randomly divide participants into two groups, so that each group has same wealth, parenting, personality, etc, removing the influence of confounding variables. Then, the researchers assign one group to eat marshmallow before time, and another to wait (Yeah, this will not work in reality and main reason why observational study exists). At the end, the researchers can tell if letting participants wait marshmallow caused them to be more successful than not-waiting.