Performance on what benchmarks? Do you mean better at practically everything? Or do you just mean 'better in practice for what most people use it for?' or what?
Also what counts as the next frontier model? E.g. if Anthropic releases "Sonnet 3.5 New v1.1" does that count?
Sorry to be nitpicky here.
I expect there to be something better than o1 available within six months. OpenAI has said that they'll have an agentic assistant up in January IIRC; I expect it to be better than o1.
I would bet on approximately the same performance
What's your bet on the next frontier models (Orion, Gemini 2, Llama-4) vs o1?
Curious to hear your answers...
For OpenAI the question is if the increase in size and training on synthetic data will beat the teaching model, without test time compute.