OpenAI Launches GPT-5.2 After “Code Red” Push

Key Takeaway

OpenAI launched GPT-5.2 following increasing competition from Google, pitching it as its most advanced model yet for developers and professional use.

ChatGPT 5.2 focuses less on personality and more on being dependable at serious work. OpenAI says the GPT-5.2 family (Instant, Thinking, Pro) is better at handling long, complicated tasks, like building spreadsheets, writing code, and working through multi-step problems, with fewer errors, as competition with Google Gemini 3 increases.

The release of ChatGPT 5.2 was not just another routine model update. It landed in the middle of an intensifying AI arms race, as OpenAI moved quickly to respond to growing pressure from Google’s Gemini 3. Internal urgency, faster release cycles, and a sharper focus on professional reliability now define how frontier AI models reach the market. GPT-5.2 reflects a shift in strategy: shipping stronger reasoning, longer memory, and fewer errors sooner, rather than waiting for flashy new features.

OpenAI Launches GPT-5.2 – Key Points

Better at multi-step work and professional tasks (GDPval)
ChatGPT 5.2 runs on OpenAI’s GPT-5.2 models (Instant, Thinking, Pro). The goal is to plan and complete multi-step tasks rather than give one-off answers. OpenAI highlights GDPval, a test covering 44 jobs across nine major industries that contribute most to the US economy. The tasks involve producing real deliverables such as presentations, accounting spreadsheets, medical clinic schedules, manufacturing diagrams, and short videos.
OpenAI says GPT-5.2 Thinking beats or matches professionals in 70.9% of comparisons, and GPT-5.2 Pro reaches 74.1%. It also claims the work can be produced at more than 11× the speed and under 1% of the cost of experts, based on older cost and speed estimates (results in ChatGPT can vary).
Reporting around the launch also links it to an internal “code red” push by CEO Sam Altman earlier in December, aimed at moving faster in response to Google Gemini 3.
Much stronger at working with very long documents
GPT-5.2 Thinking is designed for large “project-sized” inputs, contracts, reports, transcripts, research papers, and multiple files at once, where older versions could lose track. OpenAI points to results on OpenAI MRCRv2, a long-document test, saying GPT-5.2 Thinking reaches near 100% accuracy on one version of the test when the input is extremely large – up to 256k tokens. The practical goal is staying consistent while pulling together details from long, messy material.
Improved understanding of images, charts, and software screens
OpenAI says GPT-5.2 is stronger at reading charts, diagrams, dashboards, and software screenshots, useful for jobs where visual information matters. It reports 88.7% on CharXiv Reasoning (up from 80.3% in GPT-5.1 Thinking) and 86.3% on ScreenSpot-Pro (up from 64.2%) when tool support is enabled. In everyday terms: better at interpreting visuals and understanding how items are laid out on a screen.
Fewer wrong answers and fewer made-up details
OpenAI reports a 30% relative drop in errors compared with GPT-5.1 Thinking on a set of anonymized ChatGPT questions when search is enabled. In the same testing approach, it shows responses with at least one error falling from 8.8% (GPT-5.1 Thinking) to 6.2% (GPT-5.2 Thinking). This is meant to reduce the time people spend checking and correcting outputs, while still needing extra care for high-stakes decisions.
More reliable when it has to use tools and follow a messy process
GPT-5.2 is positioned as better at tasks that require several steps and repeated tool use, like customer support workflows. OpenAI uses a travel problem example (delayed flight, missed connection, overnight stay, medical seating request) to show end-to-end handling. It also reports benchmark results showing tool-use accuracy improving: 98.7% on Tau2-bench Telecom (vs 95.6%) and 82.0% on Tau2-bench Retail (vs 77.9%) for GPT-5.2 Thinking compared with GPT-5.1 Thinking.
Stronger coding ability for real software work
OpenAI says GPT-5.2 Thinking performs better on software engineering tests and is more useful for real coding tasks like debugging, implementing new features, refactoring, code review, and fixing bugs. It reports 55.6% on SWE-Bench Pro (up from 50.8%) and 80.0% on SWE-bench Verified (up from 76.3%).
Better spreadsheets and presentations, including finance-style models
OpenAI reports an internal test focused on junior investment banking spreadsheet work (for example, three-statement financial models for Fortune 500 companies and leveraged buyout models). On that benchmark, GPT-5.2 Thinking rises from 59.1% to 68.4%, a gain of 9.3 percentage points over GPT-5.1. OpenAI also says these advanced spreadsheet and presentation features in ChatGPT require a paid plan (Plus, Pro, Business, or Enterprise) and selecting GPT-5.2 Thinking or Pro, and that complex outputs may take many minutes to generate.
Gains in math, science, and abstract reasoning
OpenAI reports higher scores on several technical tests, presenting them as evidence of stronger step-by-step thinking and better number handling: 92.4% on GPQA Diamond (science, no tools), 100.0% on AIME 2025 (math, no tools), 40.3% on FrontierMath Tier 1–3 and 14.6% on Tier 4 (with Python), plus 86.2% on ARC-AGI-1 and 52.9% on ARC-AGI-2 for GPT-5.2 Thinking.
Companies say it performs better on long-running, tool-heavy work
OpenAI’s launch materials cite companies such as Notion, Box, Shopify, Harvey, and Zoom as observing strong long-horizon reasoning and tool use. It also highlights Databricks, Hex, and Triple Whale for data science and document analysis tasks, and tools such as Cognition, Warp, JetBrains, and Augment Code for improvements in coding workflows. The common theme is reliability over long, multi-step work—not just quick answers.
More careful handling of sensitive topics and added protections for teens
OpenAI reports improved responses in sensitive situations such as mental health distress and emotional dependency. It shows example evaluation scores like “Mental health” improving from 0.883 to 0.995 for Instant, and from 0.684 to 0.915 for Thinking (along with separate rows for emotional reliance and self-harm). OpenAI also says it is rolling out an age prediction model to apply extra safeguards for users under 18.
Rollout timing, model options, and what changes for users and developers
ChatGPT 5.2 began rolling out on December 11, 2025, starting with paid plans (Plus, Pro, Go, Business, Enterprise). GPT-5.1 remains available in ChatGPT for three months under a legacy model option before being removed there. In the API, OpenAI says GPT-5.2 is available immediately with names such as gpt-5.2 (Thinking), gpt-5.2-chat-latest (Instant), and gpt-5.2-pro (Pro). OpenAI also says it has no current plans to remove GPT-5.1, GPT-5, or GPT-4.1 from its API.
Pricing and infrastructure details for businesses building on it
OpenAI lists API pricing at $1.75 per 1M input tokens and $14 per 1M output tokens, with a 90% discount for cached inputs. GPT-5.2 Pro costs more: $21 per 1M input and $168 per 1M output. OpenAI argues that even when the per-token price is higher, better efficiency can reduce the cost to reach a good result on complicated tasks. It also says training and scaling rely on Microsoft Azure data centers and NVIDIA GPUs including H100, H200, and GB200-NVL72.
Productivity message aimed at enterprise users
OpenAI claims the average ChatGPT Enterprise user saves 40–60 minutes per day, and heavy users save more than 10 hours per week. GPT-5.2 is presented as a way to increase those savings by improving long documents, coding, spreadsheets, presentations, images, and tool use.
Competitive pressure and a major partnership headline
Coverage of the launch ties it to pressure from Gemini 3 and highlights a major deal: Disney said it is investing $1 billion in OpenAI and allowing characters from Star Wars, Pixar, and Marvel to be used in Sora video generation. Reporting also mentions a CNBC appearance with Sam Altman and Disney CEO Bob Iger, showing how competition and high-profile partnerships are shaping product rollouts.
Some users say it feels less friendly than the previous version
Alongside performance improvements, early feedback highlighted in TechRadar includes complaints that 5.2 can feel more corporate or less engaging than 5.1. It is early feedback, but it reflects a possible tradeoff: pushing for structure and reliability may change how the conversation “feels” for some users.

Why This Matters

GPT-5.2 shows how competition is reshaping the pace and priorities of AI development. As rivals close performance gaps, the focus moves from novelty to execution—models that can handle long, messy workflows, work reliably with tools, and deliver consistent results at scale. In this environment, speed to market and operational dependability matter as much as raw intelligence. The race is no longer about who demos best, but who ships systems people can trust to do real work every day.

This article was drafted with the assistance of generative AI. All facts and details were reviewed and confirmed by an editor prior to publication.

Google Launches Gemini 3 Deep Think to Ultra Subscribers

Google launches Deep Think for Gemini 3, offering major reasoning gains, premium access for AI Ultra users, and strong results on high-difficulty benchmarks.

Sam Altman Declares “Code Red” as OpenAI Confronts Rising Pressure from Gemini 3 and Big Tech Rivals

Sam Altman issued a “code red” directive as Gemini 3 accelerates and Apple restructures its AI division, prompting OpenAI to delay launches and intensify development.

Read a comprehensive monthly roundup of the latest AI news!

OpenAI Launches GPT-5.2 – Key Points

The AI Track News: In-Depth And Concise

More from the AI Track