MiniMax M2.5 AI Models Promise Frontier Performance at Lower Cost

Key Takeaway

MiniMax has launched the M2.5 and M2.5-Lightning language models, claiming near state-of-the-art performance while cutting costs by up to 95% compared with leading models such as Claude Opus 4.6. New details also reinforce that the models are aimed at coding, search, tool use, and automation-heavy agent workflows.

MiniMax M2.5 and M2.5-Lightning models – Key Points

The Story

Shanghai-based AI startup MiniMax has introduced two new language models, M2.5 and M2.5-Lightning, designed to deliver high-end AI performance at dramatically lower cost. The company says the models approach the performance of top systems from Google and Anthropic while enabling large-scale agentic workflows for enterprise tasks. New details indicate M2.5 is MiniMax’s third M2-series release in 108 days, and external benchmarking from OpenHands has reinforced the case that the model is especially notable not just for raw capability, but for how cheaply it can run long software-engineering workloads. Additional demonstrations in the update also position M2.5 as a practical model for producing office documents, research tasks, presentations, and landing pages through MiniMax’s own agent product.

The Facts

Two model variants released
MiniMax launched M2.5 and M2.5-Lightning, both available through API access and designed for large-scale production workloads. MiniMax says the two versions have the same core capability, with the main differences being speed and price.
MiniMax is iterating quickly
M2.5 is the company’s third M2-series iteration in 108 days, following M2 and M2.1.
Near frontier-level performance claims
The company states the models approach the performance of top-tier models such as Claude Opus 4.6, placing them in the current top tier of coding-oriented AI systems. The update also says MiniMax benchmarked M2.5 as competitive with Gemini 3 Pro and GPT-5.2, though not against GPT-5.3.
The model is positioned for coding and agentic work
The new material describes M2.5 as built for high-throughput, low-latency production environments, especially tasks involving coding, automation, search, tool use, and multi-step office workflows.
Mixture-of-Experts architecture reduces compute cost
M2.5 uses a Mixture-of-Experts (MoE) design with 230 billion parameters, but only 10 billion parameters activate per token, allowing high reasoning capacity with lower computational overhead.
Reinforcement learning framework called Forge
MiniMax trained the model using a proprietary RL framework called Forge, which exposes the model to simulated work environments where it practices coding and tool use. The new material says MiniMax has built hundreds of thousands of training environments from tasks and workspaces used inside the company.
Two months of training reported
MiniMax engineer Olive Song stated that the system was trained over approximately two months, focusing heavily on reinforcement learning across diverse environments.
CISPO method stabilizes training
The training pipeline incorporates Clipping Importance Sampling Policy Optimization (CISPO), a mathematical approach designed to prevent unstable policy updates during reinforcement learning.
Training system reportedly speeds up RL
MiniMax uses asynchronous scheduling and a tree-structured merging strategy to balance fresh and older experiences during training. According to the update, this delivers a claimed 40× training speedup over a simpler generate-then-train loop.
Architect-style reasoning approach
According to MiniMax, M2.5 tends to plan project structures and features before writing code, which the company describes as an “Architect Mindset.” The new demonstrations also describe strong planning behavior when building structured outputs such as presentations and landing pages.
Internal deployment inside MiniMax
The company reports that 30% of internal tasks are already handled by M2.5, and 80% of newly committed code at the company is generated by the model.
Strong benchmark results reported by MiniMax
- SWE-Bench Verified: 80.2%
- BrowseComp: 76.3%
- Multi-SWE-Bench: 51.3%
- BFCL tool-calling benchmark: 76.8%
Two performance tiers available
- M2.5-Lightning: ~100 tokens per second
- M2.5 standard: ~50 tokens per second
API pricing designed for high-volume usage
- M2.5-Lightning: $0.30 per 1M input tokens / $2.40 output tokens
- M2.5 standard: $0.15 per 1M input tokens / $1.20 output tokens
Major cost difference compared with leading models
MiniMax claims typical tasks cost about $0.15 versus roughly $3.00 for Claude Opus 4.6, suggesting a potential 10–20× price difference versus competing proprietary models. The update also describes MiniMax’s marketing pitch as roughly $1 per hour to run the faster model continuously at about 100 tokens per second, versus roughly $15–$20 per hour for Claude Opus.

Benchmarks / Evidence Check

MiniMax reports strong results on several industry benchmarks:

SWE-Bench Verified: 80.2% (reported by MiniMax)
BrowseComp: 76.3% (reported by MiniMax)
Multi-SWE-Bench: 51.3% (reported by MiniMax)
BFCL tool-calling benchmark: 76.8% (reported by MiniMax)

Numbers that Matter

230B parameters total model size
10B active parameters per token (MoE activation)
Hundreds of thousands of RL environments used in training, according to the update
40× training speedup claimed for Forge training orchestration
$1 per hour claimed continuous runtime for M2.5-Lightning at about 100 tokens per second
30% of internal tasks automated at MiniMax
80% of new code commits generated by the model

Risks / Limitations

Open-source status remains unclear
MiniMax described the model as open source, but weights, code, and license terms have not yet been released. The update says the model is not open weights yet, even if some serving partners already appear to have access.
Benchmark claims are partly company-reported
Performance results cited originate mainly from MiniMax benchmarks rather than independent third-party testing, though the OpenHands evaluation adds an external data point for coding-agent use cases.

Why This Matters

MiniMax’s release signals a potential shift in the economics of AI development. If high-performance models become dramatically cheaper to run, developers may move beyond simple chatbot applications and deploy long-running AI agents capable of coding, researching, generating office documents, building presentations, and managing workflows continuously without prohibitive costs.

This article was drafted with the assistance of generative AI. All facts and details were reviewed and confirmed by an editor prior to publication.