Another interesting fact is that without any NN, but using the rest of approach from the paper, their method gets 18/30 correct. The NN boosts to 25/30. The prior SOTA was 10/30 (also without a NN).
So arguably about 1/2 of the action is just improvements in the non-AI components.
Hmm it might be questionable to suggest that it is "non-AI" though? It's based on symbolic and algebraic deduction engines and afaict it sounds like it might be the sort of thing that used to be very much mainstream "AI" i.e. symbolic AI + some hard-coded human heuristics?
Sure, just seems like a very non-central example of AI from the typical perspective of LW readers.
This is another example of how matching specialized human reasoning skill seems routinely feasible with search guided by 100M scale networks trained for a task a human would spend years mastering. These tasks seem specialized, but it's plausible all breadth of human activity can be covered with a reasonable number of such areas of specialization. What's currently missing is automation of formulation and training of systems specialized in any given skill.
The often touted surprisingly good human sample efficiency might just mean that when training is set up correctly, it's sufficient to train models of size comparable to the amount of external text data that a human might need to master a skill, rather than models of the size comparable to a human brain. This doesn't currently work for training systems that autonomously do research and produce new AI experiments and papers, and in practice the technology might take a very different route. But once it does work, surpassing human performance might fail to require millions of GPUs even for training.
Though perhaps it is a subset of humans:
Chapter 8: Geometry for Americans
We call this chapter "Geometry for Americans" instead of "Geometry for Dummies" so as not to offend. The sad truth is that most mathematically inclined Americans know very little geometry, in contrast to their luckier peers in Eastern Europe and Asia. But it is never too late to learn. Geometry is a particularly fun topic to study, because you are compelled to draw lots and lots of pictures.
(30% serious here)
https://arxiv.org/abs/2404.06405
"Essentially, this classic method solves just 4 problems less than AlphaGeometry and establishes the first fully symbolic baseline strong enough to rival the performance of an IMO silver medalist. (ii) Wu's method even solves 2 of the 5 problems that AlphaGeometry failed to solve. Thus, by combining AlphaGeometry with Wu's method we set a new state-of-the-art for automated theorem proving on IMO-AG-30, solving 27 out of 30 problems, the first AI method which outperforms an IMO gold medalist."
[Published today by DeepMind]
Links to the paper appear broken, but here is a link:
https://www.nature.com/articles/s41586-023-06747-5
Interesting that the transformer used is tiny. From the paper: