Eliezer and I publicly stated some predictions about AI performance on the IMO by 2025. In honor of OpenAI's post Solving (Some) Formal Math Problems, it seems good to publicly state and clarify our predictions, have a final chance to adjust them, and say a bit in advance about how we'd update.
The predictions
Eliezer and I had an exchange in November 2021.[1] My final prediction (after significantly revising my guesses after looking up IMO questions and medal thresholds) was:
I'd put 4% on "For the 2022, 2023, 2024, or 2025 IMO an AI built before the IMO is able to solve the single hardest problem" where "hardest problem" = "usually problem #6, but use problem #3 instead if either: (i) problem 6 is geo or (ii) problem 3 is combinatorics and problem 6 is algebra." (Would prefer just pick the hardest problem after seeing the test but seems better to commit to a procedure.)
Maybe I'll go 8% on "gets gold" instead of "solves hardest problem."
Eliezer spent less time revising his prediction, but said (earlier in the discussion):
My probability is at least 16% [on the IMO grand challenge falling], though I'd have to think more and Look into Things, and maybe ask for such sad little metrics as are available before I was confident saying how much more. Paul?
EDIT: I see they want to demand that the AI be open-sourced publicly before the first day of the IMO, which unfortunately sounds like the sort of foolish little real-world obstacle which can prevent a proposition like this from being judged true even where the technical capability exists. I'll stand by a >16% probability of the technical capability existing by end of 2025
So I think we have Paul at <8%, Eliezer at >16% for AI made before the IMO is able to get a gold (under time controls etc. of grand challenge) in one of 2022-2025.
Separately, we have Paul at <4% of an AI able to solve the "hardest" problem under the same conditions.
I don't plan to revise my predictions further, but I'd be happy if Eliezer wants to do so any time over the next few weeks.
Earlier in the thread I clarified that my predictions are specifically about gold medals (and become even sharper as we move to harder problems), I am not surprised by silver or bronze. My guess would be that Eliezer has a more broad distribution. The comments would be a good place for Eliezer to state other predictions, or take a final chance to revise the main prediction.
How I'd update
The informative:
- I think the IMO challenge would be significant direct evidence that powerful AI would be sooner, or at least would be technologically possible sooner. I think this would be fairly significant evidence, perhaps pushing my 2040 TAI probability up from 25% to 40% or something like that.
- I think this would be significant evidence that takeoff will be limited by sociological facts and engineering effort rather than a slow march of smooth ML scaling. Maybe I'd move from a 30% chance of hard takeoff to a 50% chance of hard takeoff.
- If Eliezer wins, he gets 1 bit of epistemic credit.[2][3] These kinds of updates are slow going, and it would be better if we had a bigger portfolio of bets, but I'll take what we can get.
- This would be some update for Eliezer's view that "the future is hard to predict." I think we have clear enough pictures of the future that we have the right to be surprised by an IMO challenge win; if I'm wrong about that then it's general evidence my error bars are too narrow.
The uninformative:
- This is mostly just a brute test of a particular intuition I have about a field I haven't ever worked in. It's still interesting (see above), but it doesn't bear that much on deep facts about intelligence (my sense is that Eliezer and I are optimistic about similar methods for theorem proving), or heuristics about trend extrapolation (since we have ~no trend to extrapolate), or on progress being continuous in crowded areas (since theorem proving investment has historically been low), or on lots of pre-singularity investment in economically important areas (since theorem proving is relatively low-impact). I think there are lots of other questions that do bear on these things, but we weren't able to pick out a disagreement on any of them.
If an AI wins a gold on some but not all of those years, without being able to solve the hardest problems, then my update will be somewhat more limited but in the same direction. If an AI wins a bronze/silver medal, I'm not making any of these updates and don't think Eliezer gets any credit unless he wants to stake some predictions on those lower bars (I consider them much more likely, maybe 20% for "bronze or silver" vs 8% on "gold," but that's less well-considered than the bets above, but I haven't thought about that at all).
- ^
We also looked for claims that Eliezer thought were very unlikely, so that he'd also have an opportunity to make some extremely surprising predictions. But we weren't able to find any clean disagreements that would resolve before the end of days.
- ^
I previously added the text: "So e.g. if Eliezer and I used to get equal weight in a mixture of experts, now Eliezer should get 2x my weight. Conversely, if I win then I should get 1.1x his weight." But I think that really depends on how you want to assign weights. That's a very natural algorithm that I endorse generally, but given that neither of us really has thought carefully about this question it would be reasonable to just not update much one way or the other.
- ^
More if he chooses to revise his prediction up from 16%, or if he wants to make a bet about the "hardest problem" claim where I'm at 4%.
I do not think the ratio of the "AI solves hardest problem" and "AI has Gold" probabilities is right here. Paul was at the IMO in 2008, but he might have forgotten some details...
(My qualifications here: high IMO Silver in 2016, but more importantly I was a Jury member on the Romanian Master of Mathematics recently. The RMM is considered the harder version of the IMO, and shares a good part of the Problem Selection Committee with it.)
The IMO Jury does not consider "bashability" of problems as a decision factor, in the regime where the bashing would take good contestants more than a few hours. But for a dedicated bashing program, it makes no difference.
It is extremely likely that an "AI" solving most IMO geometry problems is possible today -- the main difficulty being converting the text into an algebraic statement. Given that, polynomial system solvers should easily tackle such problems.
Say the order of the problems is (Day 1: CNG, Day 2: GAC). The geometry solver gives you 14 points. For a chance on IMO Gold, you have to solve the easiest combinatorics problem, plus one of either algebra or number theory.
Given the recent progress on coding problems as in AlphaCode, I place over 50% probability on IMO #1/#4 combinatorics problems being solvable by 2024. If that turns out to be true, then the "AI has Gold" event becomes "AI solves a medium N or a medium A problem, or both if contestants find them easy".
Now, as noted elsewhere in the thread, there are various types of N and A problems that we might consider "easy" for an AI. Several IMOs in the last ten years contain those.
In 2015, the easiest five problems consisted out of: two bashable G problems (#3, #4), an easy C (#1), a diophantine equation N (#2) and a functional equation A (#5). Given such a problemset, a dedicated AI might be able to score 35 points, without having capabilities remotely enough to tackle the combinatorics #6.
The only way the Gold probability could be comparable to "hardest problem" probability is if the bet only takes general problem-solving models into account. Otherwise, inductive bias one could build into such a model (e.g. resort to a dedicated diophantine equation solver) helps much more in one than in the other.
I think this is quite plausible. Also see toner's comment in the other direction though. Both probabilities are somewhat high because there are lots of easy IMO problems. Like you, I think "hardest problem" is quite a bit harder than a gold, though it seems you think the gap is larger (and most likely it sounds like you put a much higher probability on an IMO gold overall).
Overall I think that AI can solve most geometry problems and 3-variable inequalities for free, and many functional equations and diophantine equations seem easy. And I think the easiest ... (read more)