I think that, in this case, the underlying problem was not caused by the way frequentist statistics are commonly taught and practiced by working scientists:
In the present case, the null hypothesis is that the old method and the new method produce data from the same distribution; the authors would like to see data that do not lead to rejection of the null hypothesis.
I'm no statistician, but I'm pretty sure you're not supposed to make your favored hypothesis the null hypothesis. That's a pretty simple rule and I think it's drilled into students and enforced in peer review.
I see that as the underlying problem because it reverses the burden of proof. If they had done it the right way around, six data points would have been not enough to support their method instead of being not enough to reject it. Making your favored hypothesis the null hypothesis can allow you, in the extreme, to rely on a single data point.




I would suggest the example of someone not getting the evil bit joke.
It's good because it works both ways. You only need common sense to understand it, but lay people can be intimidated by the context into not applying common sense, and you'll sometimes see domain experts try to implement essentially the same thing because they turn off common sense while in their domain.