Another possibility is that only o3-mini has this knowledge cutoff and the full o3 has a later knowledge cutoff. This could happen if o3-mini is distilled into an older model (e.g., 4o-mini). If the full o3 turns out to have a knowledge cutoff later than 2023, I'd take that as convincing evidence 4o is not the base model.
What is o3's base model?
To create DeepSeek-R1, they:
Therefore, I roughly expect o1's training process was:
An important question for the near-term scaling picture is whether o3 uses 4o as ...
This is really cool research! I look forward to seeing what you do in future. I think you should consider running human baselines, if that becomes possible in the future. Those help me reason about and communicate timelines and takeoff a lot.
Great post! Glad to see more discussion of the implications of short timelines on impactful work prioritization on LW.
These last two categories—influencing policy discussions and introducing research agendas—rely on social diffusion of ideas, and this takes time. With shorter timelines in mind, this only make sense if your work can actually shape what other researchers do before AI capabilities advance significantly.
Arguably this is not just true of those two avenues for impactful work, but rather all avenues. If your work doesn't cause someone in a ...
I'm fairly confident that this would be better than the current situation, and primarily because of something that others haven't touched on here.
The reason is that, regardless of who develops them, the first (militarily and economically) transformative AIs will cause extreme geopolitical tension and instability that is challenging to resolve safely. Resolving such a situation safely requires a well-planned off-ramp, which must route through extremely major national- or international-level decisions. Only governments are equipped to make decisions like the...
Akash, your comment raises the good point that a short-timelines plan that doesn't realize governments are a really important lever here is missing a lot of opportunities for safety. Another piece of the puzzle that comes out when you consider what governance measures we'd want to include in the short timelines plan is the "off-ramps problem" that's sort of touched on in this post.
Basically, our short timelines plan needs to also include measures (mostly governance/policy, though also technical) that get us to a desirable off-ramp from geopolitical t...
I think it is much less clear that pluralism is good than you portray. I would not, for example, want other weapons of mass destruction to be pluralized.
Thanks for your comments!
On page 10, when describing the training process for R1, they write: "We then apply RL training on the fine-tuned model until it achieves convergence on reasoning tasks." I refer to this.
I basically agree with your analysis of GPT-5--which is worrying for short-term scaling, as I tried to argue.