ChristianKl comments on This is why we can't have social science - Less Wrong

36 Post author: Costanza 13 July 2014 09:04PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (82)

You are viewing a single comment's thread. Show more comments above.

Comment author: someonewrongonthenet 14 July 2014 02:54:06AM *  3 points [-]

I sort of side with Mitchel on this.

A mentor of mine once told me that replication is useful, but not the most useful thing you could be doing because it's often better to do a followup experiment that rests on the premises established by the initial experiment. If the first experiment was wrong, the second experiment will end up wrong too. Science should not go even slower than it already does - just update and move on, don't obsess.

It's kind of how some of the landmark studies on priming failed to replicate, but there are so many followup studies which are explained by priming really well that it seems a bit silly to throw out the notion of priming just because of that.

Keep in mind, while you are unlikely to hit statistically significance where there is no real result, it's not statistically unlikely to have a real result that doesn't hit significance the next time you do it. Significance tests are attuned to get false negatives more often than false positives.

Emotionally though... when you get a positive result in breast cancer screening even when you're not at risk, you don't just shrug and say "probably a false positive" even though it is. Instead, you irrationally do more screenings and possibly get a needless operation. Similarly, when the experiment fails to replicate, people don't shrug and say "probably a false negative", even though that is, in fact, very likely. Instead, they start questioning the reputation of the experimenter. Understandably, this whole process is nerve wracking for the original experimenter. Which I think is where Mitchel was - admittedly clumsily - groping towards with the talk of "impugning scientific integrity".

Comment author: gwern 14 July 2014 08:18:33PM *  20 points [-]

A mentor of mine once told me that replication is useful, but not the most useful thing you could be doing because it's often better to do a followup experiment that rests on the premises established by the initial experiment. If the first experiment was wrong, the second experiment will end up wrong too. Science should not go even slower than it already does - just update and move on, don't obsess.

Tell me, does anyone actually do what you think they should do? That is, based on a long chain of ideas A->B->C->D, none of which have been replicated, upon experimenting and learning ~Z, do they ever reject the bogus theory D? (Or wait, was it C that should be rejected, or maybe the ~Z should be rejected as maybe the experiment just wasn't powered enough to be meaningful as almost all studies are underpowered or, can you really say that Z logically entailed A...D? Maybe some other factor interfered with Z and so we can 'save the appearances' of A..Z! Yes, that's definitely it!) "Theory-testing in psychology and physics: a methodological paradox", Meehl 1967, puts it nicely (and this is as true as the day he wrote it half a century ago):

This last methodological sin is especially tempting in the "soft" fields of (personality and social) psychology, where the profession highly rewards a kind of "cuteness" or "cleverness" in experimental design, such as a hitherto untried method for inducing a desired emotional state, or a particularly "subtle" gimmick for detecting its influence upon behavioral output. The methodological price paid for this highly-valued "cuteness" is, of course, (d) an unusual ease of escape from modus tollens refutation. For, the logical structure of the "cute" component typically involves use of complex and rather dubious auxiliary assumptions,which are required to mediate the original prediction and are therefore readily available as (genuinely) plausible "outs" when the prediction fails. It is not unusual that (e) this ad hoc challenging of auxiliary hypotheses is repeated in the course of a series of related experiments,in which the auxiliary hypothesis involved in Experiment 1 (and challenged ad hoc in order to avoid the latter's modus tollens impact on the theory) becomes the focus of interest in Experiment 2, which in turn utilizes further plausible but easily challenged auxiliary hypotheses, and so forth. In this fashion a zealous and clever investigator can slowly wend his way through a tenuous nomological network, performing a long series of related experiments which appear to the uncritical reader as a fine example of "an integrated research program", without ever once refuting or corroborating so much as a single strand of the network.

To give a concrete example of why your advice is absurd and impractical and dangerous...

One of the things I am most proud of is my work on dual n-back not increasing IQ; the core researchers, in particular, the founder Jaeggi, are well-aware that their results have not replicated very well and that the results are almost entirely explained by bad control groups, and this is in part thanks to increased sample size from various followup studies which tried to repeat the finding while doing something else like an fMRI study or trying an emotional processing variant. So, what are they doing now, the Buschkuel lab and the new Jaeggi lab? Have they abandoned DNB/IQ, reasoning that since "the first experiment was wrong, the second experiment will end up wrong too"? Have they taken your advice to "just update and move on, don't obsess"? Maybe taken serious stock of their methods and other results involving benefits to working memory training in general?

No. They are now busily investigating whether individual personality differences can explain transfer or not to IQ, whether other tasks can transfer, whether manipulation motivation can moderate transfer to IQ, and so on and so forth, and reaching p<0.05 and publishing papers just like they were before; but I suppose that's all OK, because after all, "there are so many followup studies which are explained by [dual n-back transferring] really well that it seems a bit silly to throw out the notion of [dual n-back increasing IQ] just because of that".