gwern comments on Literature-review on cognitive effects of modafinil (my bachelor thesis) - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (42)
Well, meta-analyses certainly are an area of interest to me, and I was disappointed in 2012 by "Cognition Enhancement by Modafinil: A Meta-Analysis" (Kelley et al 20120) which used only 3 studies, and so was not very informative. A new meta-analysis would be great. But... I read quickly through it, and I saw no meta-analysis. Just a literature review. What's with the post title?
Nitpick: I really hate this use of 'significantly' and I ban it from my own writing. Is this referring to effect sizes or p-values?
Eh. Absence of improvement != damage. Randal 2004 didn't find a statistically-significant decrease (and it's not clear whether it should, given that it reports 25 datasets for 3 groups, so hunting for decreases incurs worries about multiplicity). And I have to point out, as far as Müller et al 2012 goes, the decrease didn't reach p<0.05 (just 0.053), and if you're willing to accept just trending, then you should also be accepting the increase in the GEFT/Group Embedded Figures Task (p=0.08).
How important are these observations...? Well, as you found out, it can be hard to compare or meta-analyze psychology studies since studies may cover the same topic but use different sets of tests, frustrating the most obvious approach 'just univariate meta-analyze everything!'
Hah.
You're right. I don't remember why I wrote "meta-analysis". (Probably because it sounds fancy and smart). I updated the title.
p-values.
True.
No. In Randall et al. (2004) participants in the 200 mg modafinil condition made significantly more errors (p<0,05) in the Intra/Extradimensional Set Shift task than participants in the placebo and the 100 mg modafinil condition. (The 200 mg group made on average around 27 errors. The 100 mg group around 14. The control group around 17 errors.)
Actually, you linked to a different study. The results can be found in the complete study I linked to. I can upload it if you want to see it yourself.
Every single graphic in this whole thing is reprinted without permission, to tell the truth. (Is this a problem?)
I'm not an academic, but my understanding was that "significantly" was a synonym for "p<0.05" every time in academic writing. "Significantly" referring to effect size is solely the province of non-academic writing(well, that or things like history).
If only it were that simple. But one of my scripts flags use of significance language, and I have seen many times 'significant' and variants used in scientific writing as meaning important or large.
Sigh. People suck sometimes.
I'm curious if you have ideas on how to deal with that.
Maybe grouping the tests into different kinds of tests and fitting a hierarchical model inside those groups? Are there similar kinds of tests?
The standard solution seems to be 'multivariate meta-analysis'. I've done a little reading on the topic, but I've had trouble getting started with it - you need to know the correlations between the multiple outcome variables, this is typically unavailable (the data-sharing problem), and I think it only works anyway if there is at least a little bit of correlation between the multiple outcomes, while I would like to be able to collectively analyze outcomes from disjoint studies which is... less clear how to do.
This meta-analysis on meditation, has an interesting approach, they basically just analyze the effect sizes in the same "class" (averaging effect sizes within a study if there are multiple different outcomes measured in the same class).
That sounds like a completely disgusting approach... I'm going to have to read that and see if it's a legitimate strategy.
They seem to get pretty strong effect sizes and low heterogeneity, so I'm curious to hear your thoughts on it.
So, their methodology is, as far as I can tell, described by these parts:
So, they just split the effect sizes, and do an average of the 2 sets. Nothing more.
I dunno. They don't give any references to papers or textbooks on meta-analysis to justify this procedure. It doesn't sound very kosher to me.
From a statistical point of view, I wouldn't expect this to work very well. I would expect a lot of heterogeneity and a very weak signal. However, they report very strong results with low heterogeneity (which I find pretty surprising). I don't see any obvious way in which this would be "cheating".
Are you worried about something else specific?
Oh, that's easy: publication bias. If the original studies report only the measures which reached a cutoff, and the null is always true, then since their measures will generally all be on the same subjects/with the same n, their effect sizes will have to be fairly similar* and I'd expect the i^2 to be low even as the results are meaningless.
* since p is just a function of sample size & effect size, and the p threshold is fixed by convention at 0.05, and sample size n is pretty much the same across all measures - since why would you recruit a subject and then not get as much data as possible and omit lots of subjects? - only measurements with effect sizes big enough to cross the p with the fixed n will be reported.
While if each particular measure was done separately as a bunch of univariate or multivariate meta-analyses, they'd have to get access to the original data or they'd be able to see the publication bias on a measure by measure basis.
Or it might be that each measure has a weighted effect size of zero, it's just that each study is biased towards a different measure, and so its 'overall' estimate is positive even though if we had combined each measure with all its siblings, every single one would net to zero.
Maybe I'm wrong about these speculations. But I hope you see why I feel uncomfortable with this 'lump everything remotely similar together' approach and would like to see what meta-analytic experts say about the approach.
That's a great point, I hadn't been thinking about that. It amplifies the publication bias by a lot.
Right that makes sense. People rarely report the covariance matrix of the data.
Much less provide IPD/individual-patient-data which is what one really wants. The lack of data is frustrating.
Indeed.