I'm a bit surprised at the amount of press this study is getting, since the results aren't really anything new. There've been various studies showing that women do worse in math if they're reminded of being women beforehand, of Chinese-American women performing equivalently with men in math if they're reminded of their Chinese heritage beforehand (Chinese culture expects both men and women to excel at academics) but worse if they're reminded of their American heritage, et cetera.
From the picture accompanying the article (so numbers may be slightly off): on the final exam, men's average score minus women's average score was 11.3+-4.6% in the control group and 2.4+-3.8% in the experimental group. The difference in gap was thus 8.9+-6%, so about 1.5 standard deviations from no difference.
Women's score in the experimental group minus the control group was 5.9+-5.2%. Respectable, but only a bit above 1 standard deviation.
Men's score in the experimental group minus the control group was -3.3+-3.8%. Focusing on their values, rather than values other people have, made men worse at this test by a comparable amount to how much it made women better (in terms of standard deviations, not absolutes). The standard deviations narrowed for both groups- for the women, this was reported as the worst women doing better, and for the men, it seems reasonable to assume this means the best men did worse.
So, what the heck is going on here? Most likely seems statistical fluke- the experimental group happened to contain worse men and better women. These results don't seem terribly statistically significant (to get my numbers, I added together four normals with stdevs of the error bars on the picture; it would be better to check the statistical analysis in the paper itself), and so that possibility is rather strong.
An alternative is that most of these "gap-closing" mechanisms actually impede the superior group and actually help the inferior group. The control group's male score minus the experimental group's female score is 5.6+-4%- almost 1.5 stdevs from no difference (control male - control female was almost 2.5 stdevs from no difference).
Two ways to have half of each: this might have been a statistical fluke that the experimental men did worse, but this actually improves female performance. Or, the value affirmation might have made everyone do worse, but the women by some fluke did better (this is least likely, given that the women would have to be 2 stdevs unlikely upwards in the experimental group).
An alternative is that most of these "gap-closing" mechanisms actually impede the superior group and actually help the inferior group.
That was my first thought. If a physics teacher made me waste 15 minutes on such a stupid, non-physics-related exercise, I'd likely do very badly in the class (more likely, walk out and drop the class immediately).
Everybody wasted 15 minutes. The question was just what they focused on (and both options weren't physics related).
That would explain a possible difference between an experimental group that spent a 15 minute exercise on stuff other than physics and a control group that did just physics- the best students might leave the experimental group, bringing down its mean and standard deviation. But as only the focus differed between the two groups, I don't see how the impulse to leave classes that waste your time would manifest itself as a difference between the experimental and control groups. If such an effect is measurable in outcomes, it would not be noticed in this experiment.
Ah, missed that detail, thanks.
Here I had just assumed one of the groups would have been taught some physics during that 15 minutes. I guess we'll just have to keep wondering how much better teaching physics does at making people learn physics, than not teaching physics.
The students were split up into the control and values affirmation groups. If the values affirmation group happened by chance to contain more of the brighter women then the control group would contain fewer of them so the two samples cannot be treated as independent. The paper doesn't seem to mention any attempt to take this into account, so the actual p-values might be higher than those calculated in the paper, which weren't especially low to begin with.
15-minute writing exercise closes the gender gap in university-level physics:
The article cites a paper, but it's behind a paywall:
http://www.sciencemag.org/content/330/6008/1234