We conduct the first natural field experiment to explore the relationship between the "meaningfulness" of a task and worker effort. We employed about 2,500 workers from Amazon's Mechanical Turk (MTurk), an online labor market, to label medical images. Although given an identical task, we experimentally manipulated how the task was framed. Subjects in the meaningful treatment were told that they were labeling tumor cells in order to assist medical researchers, subjects in the zero-context condition (the control group) were not told the purpose of the task, and, in stark contrast, subjects in the shredded treatment were not given context and were additionally told that their work would be discarded. We found that when a task was framed more meaningfully, workers were more likely to participate. We also found that the meaningful treatment increased the quantity of output (with an insignificant change in quality) while the shredded treatment decreased the quality of output (with no change in quantity). We believe these results will generalize to other short-term labor markets. Our study also discusses MTurk as an exciting platform for running natural field experiments in economics.
Back in February 2012, lukeprog announced that SIAI was hiring more part-time remote researchers, and you could apply just by demonstrating your chops on a simple test: review the psychology literature on habit formation with an eye towards practical application. What factors strengthen new habits? How long do they take to harden? And so on. I was assigned to read through and rate the submissions and Luke could then look at them individually to decide who to hire. We didn’t get as many submissions as we were hoping for, so in April Luke posted again, this time with a quicker easier application form. (I don’t know how that has been working out.)
But in February, I remembered the linked post above from GiveWell where they mentioned many would-be volunteers did not even finish the test task. I did, and I didn’t find it that bad, and actually a kind of interesting exercise in critical thinking & being careful. People suggested that perhaps the attrition was due not to low volunteer quality, but to the feeling that they were not appreciated and were doing useless makework. (The same reason so many kids hate school…) But how to test this?
Simple! Tell people that their work was not useless and that even if they were not hired, their work would be used! And we could do Science by randomizing what people got the encouraging statement. The added paragraph looked like this:
Well, all the reviews have been read & graded as of yesterday, with submissions trickling in over months; I think everyone who was going to submit has done so, and it’s now time for the final step. So many people failed to send in any submission (only ~18 of ~40) that it’s relatively easy to analyze - there’s just not that much data!
So, the first question is, did people who got the extra paragraph do a better job of writing their review, as expressed in my rating it from 2-10?
Surprisingly, they did seem to - despite my expectation that any result would be noise as the sample is so small. If we code getting no paragraph as 0 and getting a paragraph as 1, and add the two scores to get 2-10, and strip out all personal info, you get this CSV. Load it up in
R:The result is not hugely robust: if you set the last score to 10 rather than 6, for example, the p-value falls to just 0.16. The effect size looks interesting though:
0.67 isn’t bad.
The next question to me is, did the paragraph influence whether people would send in a submission at all? Re-editing the CSV, we load it up and analyze again:
Nope. It’s somewhat robust since we can use everyone who applied; I have to flip like 6 values before the p-value goes down to 0.07.
So, lessons learned? It’s probably a good idea to include such a paragraph since it’s so cheap and apparently isn’t at the expense of submissions in general.