From the picture accompanying the article (so numbers may be slightly off): on the final exam, men's average score minus women's average score was 11.3+-4.6% in the control group and 2.4+-3.8% in the experimental group. The difference in gap was thus 8.9+-6%, so about 1.5 standard deviations from no difference.
Women's score in the experimental group minus the control group was 5.9+-5.2%. Respectable, but only a bit above 1 standard deviation.
Men's score in the experimental group minus the control group was -3.3+-3.8%. Focusing on their values, rather than values other people have, made men worse at this test by a comparable amount to how much it made women better (in terms of standard deviations, not absolutes). The standard deviations narrowed for both groups- for the women, this was reported as the worst women doing better, and for the men, it seems reasonable to assume this means the best men did worse.
So, what the heck is going on here? Most likely seems statistical fluke- the experimental group happened to contain worse men and better women. These results don't seem terribly statistically significant (to get my numbers, I added together four normals with stdevs of the error bars on the picture; it would be better to check the statistical analysis in the paper itself), and so that possibility is rather strong.
An alternative is that most of these "gap-closing" mechanisms actually impede the superior group and actually help the inferior group. The control group's male score minus the experimental group's female score is 5.6+-4%- almost 1.5 stdevs from no difference (control male - control female was almost 2.5 stdevs from no difference).
Two ways to have half of each: this might have been a statistical fluke that the experimental men did worse, but this actually improves female performance. Or, the value affirmation might have made everyone do worse, but the women by some fluke did better (this is least likely, given that the women would have to be 2 stdevs unlikely upwards in the experimental group).
An alternative is that most of these "gap-closing" mechanisms actually impede the superior group and actually help the inferior group.
That was my first thought. If a physics teacher made me waste 15 minutes on such a stupid, non-physics-related exercise, I'd likely do very badly in the class (more likely, walk out and drop the class immediately).
15-minute writing exercise closes the gender gap in university-level physics:
The article cites a paper, but it's behind a paywall:
http://www.sciencemag.org/content/330/6008/1234