Really - do you mean norm for society in general or norm for LW? The general norm for each position separately.
I also do believe that this particular type of test measures something of value for AI
Insofar as they showcase generally applicable methods, I would agree. Their use of deep learning seems encouraging, though I cannot tell from the abstract how domain-specfic their methods are, and thus to what extent similar techniques could figure into an architecture for general intelligence. If the techniques used don't robustly generalise, then you'd have to tailor the approach to whatever particular domain you're working in. Thus the society of mind remark - Minsky's thesis as I understand it is that the mind is a kludge of tailor-made components that perform nicely in their domain but are basically useless outside of it (which seems to me incompatible with the phenomenon of neuroplasticity). Anybody advocating for novel, domain-specific tailoring of general algorithms to specific domains is then adhering to Minsky's approach.
To take seriously the idea that some system represents a concrete step towards general intelligence, I'd have to see its performance on a battery of "agi-hard" metrics. I can't give a precise definition of what such might be, but IQ subtests that drastically restrict the scope of NLP techniques needed seem obviously not to qualify.
A much more compelling performance would be the ability for a system to, say, read a textbook on topology and then pass an exam paper on the subject, with neither having been pre-formated into a convenient represention.
Thus the society of mind remark - Minsky's thesis as I understand it is that the mind is a kludge of tailor-made components that perform nicely in their domain but are basically useless outside of it (which seems to me incompatible with the phenomenon of neuroplasticity).
In a complex ANN or a brain, you start with a really simple hierarchical prior over the network and a general purpose optimizer. After training you may get a 'kludge of tailor-made components' that perform really well on the domain you trained on. The result may be specific, but the p...
A research team in China has created a system for answering verbal analogy questions of the type found on the GRE and IQ tests that scores a little above the average human score, perhaps corresponding to an IQ of around 105 or so. This improves substantially on the reported SOTA in AI for these types of problems.
This work builds on deep word-vector embeddings which have led to large gains in translation and many NLP tasks. One of their key improvements involves learning multiple vectors per word, where the number of specific word meanings is simply grabbed from a dictionary. This is important because verbal analogy questions often use more rare word meanings. They also employ modules specialized for the different types of questions.
I vaguely remember reading that AI systems already are fairly strong at solving visual raven-matrix style IQ questions, although I haven't looked into that in detail.
The multi-vector technique is probably the most important take away for future work.
Even if subsequent follow up work reaches superhuman verbal IQ in a few years, this of course doesn't immediately imply AGI. These types of IQ tests measure specific abilities which are correlated with general intelligence in humans, but these specific abilities are only a small subset of the systems/abilities required for general intelligence, and probably rely on a smallish subset of the brain's circuitry.