sbenthall comments on Fixing Moral Hazards In Business Science - Less Wrong

33 Post author: DavidLS 18 October 2014 09:10PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (96)

You are viewing a single comment's thread. Show more comments above.

Comment author: DavidLS 18 October 2014 11:42:06PM *  13 points [-]

Thanks for pointing this out.

Let's use Beeminder as an example. When I emailed Daniel he said this: "we've talked with the CFAR founders in the past about setting up RCTs for measuring the effectiveness of beeminder itself and would love to have that see the light of day".

Which is a little open ended, so I'm going to arbitrarily decide that we'll study Beeminder for weight loss effectiveness.

Story* as follows:

Daniel goes to (our thing).com and registers a new study. He agrees to the terms, and tells us that this is a study which can impact health -- meaning that mandatory safety questions will be required. Once the trial is registered it is viewable publicly as "initiated".

He then takes whatever steps we decide on to locate participants. Those participants are randomly assigned to two groups: (1) act normal, and (2) use Beeminder to track exercise and food intake. Every day the participants are sent a text message with a URL where they can log that day's data. They do so.

After two weeks, the study completes and both Daniel and the world are greeted with the results. Daniel can now update Beeminder.com to say that Beeminder users lost XY pounds more than the control group... and when a rationalist sees such claims they can actually believe them.

  • Note that this story isn't set in stone -- just a sketch to aid discussion
Comment author: sbenthall 20 October 2014 04:39:17AM *  3 points [-]

He then takes whatever steps we decide on to locate participants.

Even if the group assignments are random, the prior step of participant sampling could lead to distorted effects. For example, the participants could be just the friends of the person who created the study who are willing to shill for it.

The studies would be more robust if your organization took on the responsibility of sampling itself. There is non-trivial scientific literature on the benefits and problems of using, for example, Mechanical Turk and Facebook ads for this kind of work. There is extra value added for the user/client here, which is that the participant sampling becomes a form of advertising.

Comment author: DavidLS 20 October 2014 10:56:48AM 2 points [-]

Yeah, this is a brutal point. I wish I knew a good answer here.

Is there a gold standard approach? Last I checked even the state of the art wasn't particularly good.

Facebook / Google / StumbleUpon ads sound promising in that they can be trivially automated, and if only ad respondents could sign up for the study, then the friend issue is moot. Facebook is the most interesting of those, because of the demographic control it gives.

How bad is the bias? I performed a couple google scholar searches but didn't find anything satisfying.

To make things more complicated, some companies will want to test highly targeted populations. For example, Apptimize is only suitable for mobile app developers -- and I don't see a facebook campaign working out very well for locating such people.

A tentative solution might be having the company wishing to perform the test supply a list of websites they feel caters to good participants. This is even worse than facebook ads from a biasing perspective though. At minimum it sounds like disclosing how participants were located prominently will be important.

Comment author: sbenthall 20 October 2014 11:56:56PM 3 points [-]

There are people in my department who do work in this area. I can reach out and ask them.

I think Mechanical Turk gets used a lot for survey experiments because it has a built-in compensation mechanism and there are ways to ask questions in ways that filter people into precisely what you want.

I wouldn't dismiss Facebook ads so quickly. I bet there is a way to target mobile app developers on that.

My hunch is that like survey questions, sampling methods are going to need to be tuned case-by-case and patterns extracted inductively from that. Good social scientific experiment design is very hard. Standardizing it is a noble but difficult task.