It's important to remember that o3's score on the ARC-AGI is "tuned" while previous AI's scores are not "tuned." Being explicitly trained on example test questions gives it a major advantage.
According to François Chollet (ARC-AGI designer):
Note on "tuned": OpenAI shared they trained the o3 we tested on 75% of the Public Training set. They have not shared more details. We have not yet tested the ARC-untrained model to understand how much of the performance is due to ARC-AGI data.
It's interesting that OpenAI did not test how well o3 would have done before it was "tuned."
EDIT: People at OpenAI deny "fine-tuning" o3 for the ARC (see this comment by Zach Stein-Perlman). But to me, the denials sound like "we didn't use a separate derivative of o3 (that's fine-tuned for just the test) to take the test, but we may have still done reinforcement learning on the public training set." (See my reply)
people were sentenced to death for saying "I."
Thank you for the help :)
By the way, how did you find this message? I thought I already edited the post to use spoiler blocks, and I hid this message by clicking "remove from Frontpage" and "retract comment" (after someone else informed me using a PM).
EDIT: dang it I still see this comment despite removing it from the Frontpage. It's confusing.