This is not o3; it is what they'd internally called Orion, a larger non-reasoning model.
They say this is their last fully non-reasoning model, but that research on both types will continue.
They say it's currently limited to Pro users, but the model hasn't yet shown up on the chooser (edit: it is available in the app). They say it will be shared with Plus and Enterprise users next week.
It claims to be more accurate at standard questions and with a lower hallucination rate than any previous OAI model (and presumably any others).
"Alignment" was done by both supervised fine-tuning from an unspecified dataset, and RLHF (this really only training refusals, which is pretty different from alignment in the classical sense, but could potentially help with real alignment if it's used that way - see System 2 Alignment).
The main claims are better world knowledge, better understanding of human intentions (it is modestly but distinctly preferred over 4o in their tests), and being better at writing. This suggests to me that their recent stealth upgrades of 4o might've been this model.
It does web searching and uses Canvas, and handles images.
Here's the start of the system card:
OpenAI GPT-4.5 System Card
OpenAI
February 27, 2025
1 Introduction
We’re releasing a research preview of OpenAI GPT-4.5, our largest and most knowledgeable model yet. Building on GPT-4o, GPT-4.5 scales pre-training further and is designed to be more general-purpose than our powerful STEM-focused reasoning models. We trained it using new supervision techniques combined with traditional methods like supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), similar to those used for GPT-4o. We conducted extensive safety evaluations prior to deployment and did not find any significant increase in safety risk compared to existing models.
Early testing shows that interacting with GPT-4.5 feels more natural. Its broader knowledge base, stronger alignment with user intent, and improved emotional intelligence make it well-suited for tasks like writing, programming, and solving practical problems—with fewer hallucinations. We’re sharing GPT-4.5 as a research preview to better understand its strengths and limitations. We’re still exploring its capabilities and are eager to see how people use it in ways we might not have expected.
This system card outlines how we built and trained GPT-4.5, evaluated its capabilities, and strengthened safety, following OpenAI’s safety process and Preparedness Framework.
2 Model data and training
Pushing the frontier of unsupervised learning
We advance AI capabilities by scaling two paradigms: unsupervised learning and chain-of-thought reasoning. Scaling chain-of-thought reasoning teaches models to think before they respond, allowing them to tackle complex STEM or logic problems. In contrast, scaling unsupervised learning increases world model accuracy, decreases hallucination rates, and improves associative thinking. GPT-4.5 is our next step in scaling the unsupervised learning paradigm.
New alignment techniques lead to better human collaboration
As we scale our models, and they solve broader, more complex problems, it becomes increasingly important to teach them a greater understanding of human needs and intent. For GPT-4.5, we developed new, scalable alignment techniques that enable training larger and more powerful models with data derived from smaller models. These techniques allowed us to improve GPT-4.5’s steerability, understanding of nuance, and natural conversation.
Internal testers report GPT-4.5 is warm, intuitive, and natural. When tasked with emotionally charged queries, it knows when to offer advice, diffuse frustration, or simply listen to the user. GPT-4.5 also shows stronger aesthetic intuition and creativity. It excels at helping users with their creative writing and design.
GPT-4.5 was pre-trained and post-trained on diverse datasets, including a mix of publicly available data, proprietary data from data partnerships, and custom datasets developed in-house, which collectively contribute to the model’s robust conversational capabilities and world knowledge.
Safety is limited to refusals, notably including refusals for medical or legal advice. Have they deliberately restricted those abilities to avoid lawsuits or to limit public perceptions of expertise being overtaken rapidly by AI?
They report no real change from previous safety evaluations, which seems reasonable as far as it goes. We're not to the really scary models yet, although it will be interesting to see if this produces noticably better tool-use and the type of recursive self-checking that's crucial for powering competent agents. They say it has those, and improved planning and "execution":
Based on early testing, developers may find GPT‑4.5 particularly useful for applications that benefit from its higher emotional intelligence and creativity—such as writing help, communication, learning, coaching, and brainstorming. It also shows strong capabilities in agentic planning and execution, including multi-step coding workflows and complex task automation.
They also say it's compute intensive, so not a replacement for 4o. This could be why they hadn't released Orion earlier. I wonder if this release is in response to Claude 3.7 taking top spots for most non-reasoning-appropriate tasks.
GPT‑4.5 is a very large and compute-intensive model, making it more expensive than and not a replacement for GPT‑4o. Because of this, we’re evaluating whether to continue serving it in the API long-term as we balance supporting current capabilities with building future models.
>Safety is limited to refusals, notably including refusals for medical or legal advice. Have they deliberately restricted those abilities to avoid lawsuits or to limit public perceptions of expertise being overtaken rapidly by AI?
I think it's been well over a year since I've had an issue with getting an LLM to give me medical advice, including GPT-4o and other SOTA models like Claude 3.5/7, Grok 3 and Gemini 2.0 Pro. I seem to recall that the original GPT-4 would occasionally refuse, but could be coaxed into it.
I am a doctor, and I tend to include that information either in model memory or in a prompt (mostly to encourage the LLM to assume background knowledge and ability to interpret facts). Even without it, my impression is that most models simply append a "consult a human doctor" boilerplate disclaimer instead of refusing.
I would be rather annoyed if GPT 4.5 was a reversion in that regard, as I find LLMs immensely useful for quick checks on topics I'm personally unfamiliar with (and while hallucinations happen, they're quite rare now, especially with search, reasoning and grounding). I don't think OAI or other AI companies have faced any significant amount of litigation from either people who received bad advice, or doctors afraid of losing a job.
I'm curious about whether anyone has had any issues in that regard, though I'd expect not.