OpenAI’s GPT-4.1 model family—comprising GPT-4.1, mini, and nano—delivers major improvements in coding, instruction following, and long-context comprehension. With state-of-the-art benchmarks, lower latency and pricing, and a 1-million-token input capacity, the GPT-4.1 models are positioned to replace GPT-4.5 and drive next-generation agentic applications across industries.

OpenAI Launches GPT-4.1 – Key Points
Model Family Launched:
OpenAI released GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano via API. All three models outperform GPT-4o and GPT-4.5 on coding, instruction following, and long-context reasoning benchmarks. GPT-4.1 is now OpenAI’s new flagship AI model, with GPT-4.5 scheduled for deprecation on July 14, 2025.
Context Window Upgrade:
The models support up to 1 million tokens, a significant jump from GPT-4o’s 128K limit. This allows comprehensive ingestion of full codebases, legal corpora, or lengthy conversations. OpenAI trained GPT-4.1 to effectively ignore distractors and retrieve relevant information at any context depth.
Real-World Engineering Goals:
OpenAI continues to pursue the vision of an “agentic software engineer” capable of executing full software cycles—frontend/backend dev, bug resolution, QA, and documentation. This vision remains central to the model’s design.
Coding Improvements:
- GPT-4.1 scored 54.6% on SWE-bench Verified, a 21.4% absolute improvement over GPT-4o and 26.6% over GPT-4.5.
- Aider’s polyglot diff benchmark showed over 2× improvement compared to GPT-4o, with extraneous edits reduced from 9% to 2%.
- Human graders preferred GPT-4.1’s frontend app output over GPT-4o’s 80% of the time.
- Qodo found GPT-4.1 delivered better code reviews in 55% of PRs, while Windsurf noted a 60% improvement in first-pass acceptance rates.
Instruction Following Enhancements:
- GPT-4.1 scored 38.3% on Scale’s MultiChallenge and 87.4% on IFEval, significantly ahead of GPT-4o.
- Excelled in handling content constraints, response formats (e.g., YAML/Markdown), negative instructions, ranking, and multi-turn conversations.
- Blue J reported 53% greater accuracy in legal/tax use cases. Hex saw near 2× improvement in SQL-based reasoning and schema comprehension.
Long-Context Reliability:
- GPT-4.1 leads in OpenAI-MRCR (multi-reference coreference) and Graphwalks (multi-hop reasoning) benchmarks.
- Carlyle reported 50% better data extraction from long, dense financial documents.
- Thomson Reuters saw a 17% improvement in multi-document legal analysis using GPT-4.1 in CoCounsel.
Needle-in-Haystack & Multi-Hop Reasoning:
Successfully retrieves specific content buried across massive context windows, a critical feature for use cases like document comparison, cross-referencing clauses, or summarizing meeting transcripts.
Vision & Multimodal Understanding:
GPT-4.1 mini outperformed GPT-4o on image and visual reasoning benchmarks including MMMU, MathVista, and CharXiv.
In Video-MME (no subtitles), GPT-4.1 achieved 72%, setting a new SOTA for long-form video QA.
Latency and Cost Efficiency:
- GPT-4.1 is 26% cheaper than GPT-4o for typical queries.
- Prompt caching discounts increased to 75%, up from 50%.
- GPT-4.1 nano delivers sub-5 second first-token latency (128K input) and is ideal for tasks like autocomplete or classification.
Pricing Table (per 1M tokens):
Model Input Cached Input Output Blended Price GPT-4.1 $2.00 $0.50 $8.00 $1.84 GPT-4.1 mini $0.40 $0.10 $1.60 $0.42 GPT-4.1 nano $0.10 $0.025 $0.40 $0.12 Developer Tools Expansion:
OpenAI released the Codex CLI for local terminal use and enhanced API primitives like the Responses API to support low-latency, high-accuracy agentic use cases.
Release Schedule Updates:
- GPT-4.5 preview to be deprecated on July 14, 2025.
- GPT-4 will be removed from ChatGPT on April 30, 2025, as GPT-4o becomes the default.
- GPT-5 has been delayed beyond its previously expected May 2025 window, with Sam Altman citing integration complexity.
- OpenAI is preparing to launch the o3 reasoning model and o4 mini as part of its next wave of releases.
Why This Matters:
GPT-4.1 positions OpenAI at the forefront of developer-first AI. With massive improvements across coding, instruction, and context handling, it offers enterprise-grade reliability while cutting costs and latency. Its ability to power agentic systems, long-form reasoning, and multimodal tasks sets a new bar for what AI can accomplish across software, legal, financial, and customer support domains.
Google introduces Gemini Code Assist for Individuals, offering 180,000 free code completions monthly, challenging GitHub Copilot’s dominance in AI-powered developer tools.
Read a comprehensive monthly roundup of the latest AI news!