are you saying that variations in the new-object times imply confounding factors which corrupt the results?
Technically, yes. But phrasing it that way sounds like the test algorithm should include, "Check that the new object times are consistent". That's not how I detected the error. I said, "Remember that what we originally wanted to know is whether the old object times are different - and they aren't."
The data show the rats spending the same amount of time examining the old objects in all cases. The investigators concluded that the rats didn't recognize the old objects in those cases where they spent less time than usual examining new objects. That interpretation requires believing that it's more likely that the new-object-times plot a strange but reliable function f(M) describing how curious rats are about new objects M minutes after being exposed to a different object, than that your experiment is messed up.
Note also the leftmost two points in figure 1B. This shows that the control rats and the gene-therapy rats both spent the same amount of time investigating the old objects. So now, to continue with the interpretation that the new-object-time is a good control, you have to believe that the gene therapy has both improved the rats' ORM, and made them more inherently curious about objects shown to them 60 minutes after being shown some other object.
In other words, if the setup were good, the old object time ought to increase, rather than the new object time decrease.
Take a look at "Role of Layer 6 of V2 Visual Cortex in Object-Recognition Memory", Science 3 July 2009:
Vol. 325. no. 5936, pp. 87 - 89. The article has some good points, but I'm going to pick on some of its tests.
The experimenters believed they could enhance object-recognition memory (ORM) by using a lentivirus to insert a gene into area V2 of visual cortex. They tested the ORM of rats by putting an object in a field with a rat, and then putting either the same object ("old"), or a different object ("new"), in the field 30, 45, or 60 minutes later. The standard assumption is that rats spend more time investigating unfamiliar than familiar objects.
They chose this test: For each condition, measure the difference in mean time spent investigating the old object vs. the new object. If the latter is more than the former, and the difference is statistically-significant, conclude that the rats recognized the old object.
Figure 1 Graph A (below the article summary cutoff) shows how much time normal rats spent investigating an object. Here it is in HTML table form: How much time the rats spent exploring old and new objects:
The black bars (new) are significantly longer than the white bars (old) after 30 and 45 minutes, but not after 60 minutes. Therefore, the normal rats recognized the old objects after 30 and 45 minutes, but not after 60 minutes.
Figure 3 Graph D (also below the article-summary cutoff) shows how much time different types of rats spent exploring old and new objects. The "RGS" group is rats given the gene therapy, but in parietal cortex rather than in V2.
Here it is in HTML form: How much time the rats spent exploring old and new objects, by rat type:
Parietal RGS rats displayed no difference in time spent exploring old and new objects after 60 minutes; therefore, this gene therapy to parietal cortex does not improve ORM.
To recap:
So why don't I buy it?
Figure 1 (look at A).
Figure 3 (look at D):
(Original image is here.)
The investigators were trying to determine when rats recognized an old object. So what's most relevant is how much time they spent investigating the old object. The time spent investigating new objects is probably supposed to control for variations in their testing procedure.
But in both of the graphs, we see that they are claiming that rats failed to recognize an old object in the 60-minute condition, even though they spent the same amount of time investigating it as in the other conditions. The difference was only in their response to new objects. The test methodology assumes that the response to new objects is always the same.
Look at the error bars on those graphs. The black bars are supposed to all be the same height (except in 1B and 1C). Yet we see they differ across conditions by what looks like about 10 standard deviations in several cases.
When you regularly get 10 standard deviations of difference in your control variable across cases, you shouldn't say, "Gee, lucky thing I used that control variable! Otherwise I never would have noticed the large, significant difference between the test and control cases." No; you say, "Gee, something is wrong with my experimental procedure."
A couple of other things to notice, in addition to the comments above:
One subtle type of error is committed disproportionately by scientists, because it's a natural by-product of the scientific process of abstracting a theory into a testable hypothesis. A scientist is supposed to formulate a test before performing the test, to avoid introducing bias into the test formulation in order to get the desired results. Over-encapsulation is when the scientist performs the test, and examines the results according to the previously-established criteria, without noticing that the test results invalidate the assumptions used to formulate the test. I call it "over-encapsulation" because the scientist has tried to encapsulate the reasoning process in a box, and put data into the box and get decisions out of it; and the journey into and out of the box strips off relevant but unanticipated information.
Over-encapsulation is especially tricky when you're reasoning about decision theory. It's possible to construct a formally-valid evaluation of the probabilities of different cases; and then take those probabilities and choose an action based on them using some decision theory, without noticing that some of the cases are inconsistent with the assumptions used in your decision theory. I hope to write another, more controversial post on this someday.