(by the way, the "active control" group practiced vocab and trivia, which should have no overlap to what's tested by SPM and TONI, which are completely nonverbal)
You're right. I didn't actually locate and compare the unsplit numbers from table 1; I just visually estimated (from the pretty bar chart, Fig 4) the average of the two n-back subgroups, since they're equal-sized. It looks like the n-backers (compared to the trivia/vocab studiers) a non-significantly superior improvement short term, and a non-significantly worse improvement long term.
I'm also puzzled as to why there's no passive control. Even though there's no obvious overlap in vocabulary/trivia learning and SPM/TONI, I'd expect some generalized training effect, at least in motivation/focus.
I guess my overall view of the evidence is: don't expect single n-back to do much better than any other form of same-effort mental exercise, for any purpose except the exact task trained.
There's no passive control because there are only 62 kids. Only spend as many kids as it takes to publish.
I would not expect a generalized training effect. Almost nothing exhibits cross-test training. People are excited about n-back because it is the only test that is said to.
Following up on the 2010 study, Jaeggi and University of Michigan people have run a Single N-back study on 60 or so children.
The abstract is confident and the mainstream coverage unquestioning of the basic claim. But reading it, the data did not seem very solid at all - I will forbear from describing my reservations exactly; I have been accused of being biased against n-backing, however, and I'd appreciate outside opinions, especially from people with expertise in the area.
(Background: Jaeggi 2011 in my DNB FAQ. Don't read it unless you can't render the above requested opinion, since it includes my criticisms.)