I am confused. Shouldn't the questions depend on the content of the study being performed? Which would depend (very specifically) on the users/clients? Or am I missing something?
I am hopeful that at minimum we can create guidelines for selecting questions.
I also think that some standardized health & safety questions by product category would be good (for nutritional supplements I would personally be interested in seeing data for nausea, diarrhea, weight change, stress/mood, and changes in sleep quality).
For productivity solutions I'd be curious about effects on social relationships, and other changes in relaxation activities.
Within a given product category, I'm also hopeful we can reuse a lot of questions. Soylent's test and Mealsquares' test shouldn't require significantly different questions.
He then takes whatever steps we decide on to locate participants.
Even if the group assignments are random, the prior step of participant sampling could lead to distorted effects. For example, the participants could be just the friends of the person who created the study who are willing to shill for it.
The studies would be more robust if your organization took on the responsibility of sampling itself. There is non-trivial scientific literature on the benefits and problems of using, for example, Mechanical Turk and Facebook ads for this kind of work. There is extra value added for the user/client here, which is that the participant sampling becomes a form of advertising.
Yeah, this is a brutal point. I wish I knew a good answer here.
Is there a gold standard approach? Last I checked even the state of the art wasn't particularly good.
Facebook / Google / StumbleUpon ads sound promising in that they can be trivially automated, and if only ad respondents could sign up for the study, then the friend issue is moot. Facebook is the most interesting of those, because of the demographic control it gives.
How bad is the bias? I performed a couple google scholar searches but didn't find anything satisfying.
To make things more complicated, some companies will want to test highly targeted populations. For example, Apptimize is only suitable for mobile app developers -- and I don't see a facebook campaign working out very well for locating such people.
A tentative solution might be having the company wishing to perform the test supply a list of websites they feel caters to good participants. This is even worse than facebook ads from a biasing perspective though. At minimum it sounds like disclosing how participants were located prominently will be important.
This is an interesting project!
An obvious relevant model is Gwern's self experimentation on himself (http://www.gwern.net/Nootropics)
The key difference being, of course, that you are interested in group differences.
A key important step will be offering power calculations so that they sample size can be estimated prior to performing the test. (Also, so that post-hoc, you can understand how big an effect your study should have been able to detect.)
There are already some web apps that perform this, however. How will your app improve over those, or will yours offer an integrated solution and therefore be more valuable?
Thanks, that's a great point.
I'm worried that a statistical calculator will throw off founders who would otherwise test their products with us (specifically YC founders, an abnormally influential group), so as much as possible I'd like to keep sample sizes in the "Advanced Menu" section. (This is not to say this is an unimportant issue -- I'm saying this is a more important issue because many people won't be customizing the default values).
I also think there are three unique features for product studies that can help simply defining good default values here:
- Startups are going to be interested in talking about big improvements (small sample sizes needed).
- Startups will likely view study participation as advertising, allowing for a generous margin of error on sample size.
- Consumers are skeptical of low-sample size studies, even when they shouldn't be.
What do you suggest we do? It sounds like getting baseline mean and variance data for the questions we include with the app is basically a requirement.
What is the awesome version of handling this issue? :p
One issue that seems more likely to be problematic when the web application is being created and launched than later on, is whether the questions are well designed. There's a whole area of expertise that goes into creating scales that are reliable, valid, and discriminative. One possibility is to construct them from scratch from first principles, and then make them publicly available, but another possibility is to find the best of what exists already that is open sourced.
For general biotics and meal squares it seems like some measure of "not having a happy tummy" is a relevant thing to measure. If soylent gets in on the process they might have a similar interest?
A little bit of googling turned up the Gastrointestinal Symptom Rating Scale. It has 15 items (which might be too many?) and it is interview based (so hard to fit into an automated system). The really nice thing was that I could find a PDF and it all looked pretty basic.
A 2006 paper by van Zanten tipped me off to the existence of:
The Glasgow Dyspepsia Severity Scale
The Leeds Dyspepsia Questionnaire (public domain, with a Mandarin version!)
The Severity of Dyspepsia Assessment
I'm feeling like in this situation, I can safely say "I love standards, there are so many to choose from"! One of the things that turned up in my searches that seems like a really useful "meta find" is the Proqolid Clinital Outcomes Assessment database but it requires membership to use the internal search function and I need to pause to grab some dinner.
Thank you for posting this!
I'm feeling like in this situation, I can safely say "I love standards, there are so many to choose from"
Getting a list of LessWrong approved questions would be awesome. Both because I think the LW list will be higher quality than a lot of what's out there, and because I feel question choice is one of the free variables we shouldn't leave in the hands of the corporation performing the test.
Looking closer, I actually think their existing Fulfillment APIs would just work for this (ie the webapp controls an Amazon fulfillment account, the person seeking a test ships two pallets of physical product there, the webapp says where to send them).
You are right, if we already have hosted our webapp at trust-place we should be able to use the existing Amazon API.
If the company whose product is tested simply ships additional copies to the Amazon warehouse, those copies could by achieved by the trusted organisation. If anybody doubts that the products are real the trusted organisation has copies that they can analyse. If the whole things scales the trusted organisation also can randomly inspect products to see if they contain what they should contain.
Agree. I think having a trusted third party handle the shipments is cleaner at the moment. I'm still curious what blogospheroid's thread comes to. It seems like the paranoia of cryptoland is helping us see some more holes in modern experiment design
Yes cryptoparanoia is always fun ;) The web app could regularly publish hashes of the data of specific studies to a public block chain. That way any tempering that happens afterwards can be detected and you only need to trust that the web app is temper proof the moment the data gets transmitted.
If anybody doubts that the products are real the trusted organisation has copies that they can analyse.
This is a great point. Maybe community members could bet karma on the outcome of a tox screening? This could create a prioritized list.
One problem with my earlier suggestion is that some companies will want narrowly selected participant pools. These will necessarily differ from the population at large, and might create data that looks like a poison placebo is being used. I see two possible solutions to this problem:
- Log baseline data before the treatment is used. If people do worse on the placebo, that would be very suspicious.
- Include an additional group of testers that do something different not related to the placebo/product. "Eat an apple every day for the next week". If the placebo group did worse than the apple group, that would be very suspicious.
I feel like #2 from above is unsatisfying though, if we think it works then why are we using normal placebos?
The web app could regularly publish hashes of the data of specific studies to a public block chain. That way any tempering that happens afterwards can be detected and you only need to trust that the web app is temper proof the moment the data gets transmitted.
This would actually be really easy to implement. (Not the block chain portion, the per-study rolling checksums).
I sincerely hope that study plan would not pass muster. Doesn't there need to be a more reasonable placebo?
In general, who will review proposed studies for things like suitable placebo decisions?
Can you provide an example of what you'd like to see pass muster?
I sincerely hope that study plan would not pass muster. Doesn't there need to be a more reasonable placebo?
In general, who will review proposed studies for things like suitable placebo decisions?
I'm glad you're here. My background is in backend web software, and stats once the data has been collected. I read "Measuring the Weight of Smoke" in college, but that's not really a sufficient background to design the general protocol. That's a lot of my motivation behind posting this to LW - there seem to be protocol experts here, with great critiques of the existing ones.
My hope is we can create a "getting started testing" document that gets honest companies on the right track. Searching around the web I'm finding things like this rather than serious guides to proper placebo creation.
In general, who will review proposed studies for things like suitable placebo decisions?
I'm hoping either registered statistical consultants or grad students. Hopefully this can be streamlined by a good introductory guide.
I have thought a bit more about the blinding issue.
One question that comes to mind: How do we trust that the placebo is a real placebo and substantially different then the drug? The company producing the product wants to show a difference. Therefore they have no incentive to give both parties the same product.
On the other hand the company could mix some slight poison into the placebo. Even an ineffective drug beats a poison.
Therefore the placebo has to be produced or purchased by a trusted organisation and that organisation has to package the placebo in the same box that it's packaging the drug.
This is awesomely paranoid. Thank you for pointing this out.
I'm a little worried a solution here will call for whoever controls the webapp to also be an expert at creating placebos for every product type. (If we trust contract manufacturers to be honest, then the issue of adding poisons to a placebo can be handled by having them ship directly to the third party for mailing... but I that's already the default case).
Perhaps poisons can be discovered by looking at other products which performed the same protocol? "This experiment has to be re-done because the control group mysteriously got sick" doesn't seem like a good solution though...
I'll wrestle with this. Maybe something with MaxL's answer to #8 might be possible?
You ship both a placebo package and a non-placebo package to the participant, and have them flip a coin to decide which one to use. They either throw away or disregard the other package for the duration of the study.
That's possible but it means that you double your product costs. The advantage would be that you can do crossover trials.
In the case of your probiotic a crossover trial might to worthwhile even without this reason.
In this case you could encrypt a list with products ID for placebos and non-placebos before you ship your product. The participant has no opportunity to know which of the two products are placebos.
When he starts the study the participant puts the product ID of the product he decides to use into the webapp. He also puts the ID of the product he doesn't want to use in the webapp.
The participant has no way to decide between the two packages or know which one is the placebo and which one is the real thing so he doesn't need to go through the process of flipping a real coin.
Once the study is finished you releases the decryption key that can be used to distinguish placebos from real products. You can give the decryption key to a trusted third-party organisation so that your company can't prevent the results of the study from being published.
You ship N packages to Total/N participants. The participants which receive N packages then randomly assigns himself a package, and randomly distributes the remaining (N-1) packages to other participants.
That means the participants has to do the work of going to the post office and remailing packages. Some of them will require additional time to remail and it might produce complications.
About the shipping of products and placebos to people, I see a physical way of doing it, but it is definitely not scalable.
Maybe you can win a company such as Amazon as a partner for distribution. Amazon warehouses can store both products and placebos and ship randomly.
For Amazon it should be relatively little work and it could be good PR.
Labels don't have to be printed on the bottle. Amazon has the capabilities of adding a paper with study instructions to a shipment and that paper can contain a unique ID for the product. Amazon could again publish an encrypted version of the ID at the beginning of the study and release the decryption key at the end of the study.
The participant has no way to decide between the two packages or know which one is the placebo and which one is the real thing so he doesn't need to go through the process of flipping a real coin.
I am worried that standardized shipping will come with standardized package layout, and I'm guessing "preference of left vs right identical thing" correlates with something the system will eventually test. Having thought about it more, this is the real issue with allowing customers to choose which product they'll use: that decision has to be purely random if you want the math to be simple / understandable. I agree people are unlikely to actually flip a coin :/
Thankfully the fix is easy: you have the testing webapp decide for the participant. They receive the product, enter the numbers online, and are told which to use.
For non-crossover trials I agree this needlessly increases the cost. It's almost surely better to use a trusted third party.
That means the participants has to do the work of going to the post office and remailing packages. Some of them will require additional time to remail and it might produce complications.
Agree. I think having a trusted third party handle the shipments is cleaner at the moment. I'm still curious what blogospheroid's thread comes to. It seems like the paranoia of cryptoland is helping us see some more holes in modern experiment design (ie your thread on poisonous placebos).
Maybe you can win a company such as Amazon as a partner for distribution. Amazon warehouses can store both products and placebos and ship randomly.
Thanks for saying this. Looking closer, I actually think their existing Fulfillment APIs would just work for this (ie the webapp controls an Amazon fulfillment account, the person seeking a test ships two pallets of physical product there, the webapp says where to send them).
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Could you please link to examples of the kind of marketing studies that you are talking about? I'd especially like to see examples of those that you consider good vs. those you consider bad.
I did a poor job at the introduction. I'm assuming the studies exist, because if they don't that's full on false advertising.
Not to pick on anyone in particular here are some I recently encountered:
The probiotics section at wholefoods (and my interactions with customers who believed those claims or were skeptical of my claims given the state of the supplement market) was what finally caused me to post this thread.
As a perplexing counterbalance to wholefoods are companies which don't advertise any effects whatsoever, even though you'd expect they would.
List of companies where a lack of studies/objective claims caught my imagination: