Claude Opus 4.8 Tops AI Model Benchmark

Key Takeaway

Anthropic has released Claude Opus 4.8, a new version of its flagship Claude model focused on better reasoning, agentic coding, tool use, and long-running work. The model is available now at the same standard API price as Opus 4.7, while Artificial Analysis ranks it as the new leader on its Intelligence Index.

Claude Opus 4.8 Launches – Key Points

The Story

Anthropic released Claude Opus 4.8 on May 28, 2026, positioning it as the company’s most capable generally available model to date. The upgrade builds on Claude Opus 4.7 with improvements across coding, long-horizon agentic tasks, reasoning calibration, tool triggering, and knowledge-work performance.

The release also adds new product and developer features: effort control in claude.ai and Cowork, dynamic workflows in Claude Code, fast mode for the API, mid-conversation system messages, and a lower prompt-caching minimum.

The Facts

Availability and Pricing

Claude Opus 4.8 is available now across Anthropic’s supported surfaces.
Developers can use it through the Claude API with the model ID claude-opus-4-8.
Standard API pricing is unchanged from Opus 4.7: $5 per million input tokens and $25 per million output tokens.
Cache writes are priced at $6.25 per million tokens, with a 5-minute time to live, while cache hits are priced at $0.50 per million tokens.
Fast mode pricing for Opus 4.8 is $10 per million input tokens and $50 per million output tokens.
Anthropic says fast mode can deliver up to 2.5× higher output tokens per second.
Fast mode for Opus 4.8 is three times cheaper than fast mode pricing for previous Opus models.
The launch reflects growing customer pressure for more flexible AI spending, with effort settings and fast mode giving teams more control over cost, speed, and quality.

Core Model Specs

Claude Opus 4.8 supports a 1 million-token context window by default on the Claude API, Amazon Bedrock, and Vertex AI.
On Microsoft Foundry, the context window is 200,000 tokens at launch.
The model supports up to 128,000 output tokens.
It keeps the same broad feature set as Opus 4.7, including adaptive thinking, prompt caching, batch processing, Files API, PDF support, vision, and tool use.
There are no breaking API changes for code already running on Claude Opus 4.7.

Benchmark Highlights

Artificial Analysis ranks Claude Opus 4.8 first on its Intelligence Index, with a score of 61.4, up 4.1 points from Opus 4.7 and 1.2 points ahead of GPT-5.5 xhigh.
On SWE-bench Pro, Claude Opus 4.8 scores 69.2%, up from 64.3% for Opus 4.7.
On SWE-bench Verified, it reaches 88.6%.
On Terminal-Bench 2.1, it scores 74.6%, compared with 66.1% for Opus 4.7.
On Humanity’s Last Exam, it scores 49.8% without tools and 57.9% with tools.
On the USA Mathematical Olympiad, it scores 96.7%, compared with 69.3% for Opus 4.7.
On GDPval-AA, Artificial Analysis reports an 1,890 Elo score, 137 points above Opus 4.7 and 121 points ahead of GPT-5.5 xhigh, implying an approximately 67% win rate against GPT-5.5 xhigh.
On GDPval-AA, Opus 4.8 used 15% fewer turns and 35% fewer output tokens than Opus 4.7, but still used about 30% more turns than GPT-5.5.
On OSWorld-Verified, the gain is small: 83.4% versus 82.8% for Opus 4.7.
On AutomationBench, it improves to 15.5%, compared with 9.9% for Opus 4.7.
On GraphWalks, it scores 85.9% on the 256K BFS subset and 68.1% on the full 1M subset, up from 76.9% and 40.3% for Opus 4.7.
On AA-Omniscience, Opus 4.8 ranks second with a score of 27.4, behind Gemini 3.1 Pro at 32.9. Its accuracy reached 46.6%, while its hallucination rate stayed roughly flat at 35.9%.
On Vending-Bench 2, Opus 4.8 performs worse than Opus 4.7, after Anthropic removed business-focused training that had introduced misaligned behavior in the previous model.

What Is New

Claude Opus 4.8 is not presented as a radical redesign. It is a quality-focused release aimed at making Claude more dependable in complex work.

Anthropic highlights three main behavior improvements:

better long-horizon agentic coding, especially in long-context tasks and compaction recovery;
better reasoning effort calibration across different task types;
better tool triggering, with fewer missed tool calls when a task requires external actions.

For developers and teams building agents, the practical point is that Opus 4.8 is designed to stay on task for longer workflows, use tools more reliably, and recover better when context needs to be compressed.

Effort Control Gives Users More Choice

Claude Opus 4.8 defaults to high effort, which Anthropic describes as the best overall balance of quality and user experience.

Users can now adjust effort in claude.ai and Cowork. Higher effort settings make Claude think more deeply and use more tokens. Lower effort settings produce faster answers and consume rate limits more slowly.

For difficult coding tasks or long-running autonomous workflows, Anthropic recommends using extra effort, shown as xhigh in Claude Code. Developers who already set effort explicitly will keep their current settings.

This gives users and organizations a clearer cost-performance choice: spend more tokens for harder work, or use lower effort for faster responses and slower rate-limit consumption.

Dynamic Workflows Expand Claude Code

Claude Code is also getting a research-preview feature called dynamic workflows.

The feature lets Claude plan a large task, run hundreds of parallel subagents in one session, verify the results, and report back to the user. Anthropic gives codebase-scale migrations across hundreds of thousands of lines of code as one example.

Dynamic workflows are available in Claude Code for Enterprise, Team, and Max plans.

Better Self-Checking, With One Safety Caveat

One of the most important changes in Claude Opus 4.8 is improved self-calibration. Anthropic says the model is around four times less likely than Opus 4.7 to let flaws in its own code pass unremarked.

The Claude Opus 4.8 system card also reports stronger alignment results, including lower rates of misaligned behavior than Opus 4.7 and prosocial-trait scores similar to Claude Mythos Preview, Anthropic’s more advanced model class.

What It Means for Everyday Users

For non-developers, the most visible change is effort control. Users can choose faster, lighter responses for simple tasks or deeper responses for more complex work.

For professionals using Claude for writing, analysis, research, planning, or document-heavy workflows, Opus 4.8 should feel more useful when the task requires judgment, uncertainty handling, or sustained context.

What Developers Should Check Before Switching

Developers already using Opus 4.7 should be able to move to Opus 4.8 without breaking existing code.

The main migration checks are:

update the model name to claude-opus-4-8;
re-test effort settings, because effort levels have been recalibrated;
remove any old context-window beta header where the 1 million-token context window is already default;
check whether mid-conversation system messages can simplify instruction updates;
review refusal handling if the app needs to respond differently to different refusal categories;
test prompt-injection safeguards in agentic workflows;
re-baseline cost and latency, especially if using fast mode or high-effort agentic workflows.

Why This Matters

Claude Opus 4.8 matters because the next stage of AI competition is shifting from chatbot quality to dependable autonomous work. Better long-context handling, tool use, effort control, benchmark performance, and agentic workflows directly affect whether AI systems can handle complex coding, research, business analysis, and operational tasks without constant human correction.

This article was drafted with the assistance of generative AI. All facts and details were reviewed and confirmed by an editor prior to publication.