DeepSeek V4 Pro Price Cut Made Permanent At 75%

Key Takeaway

The DeepSeek V4 Pro price cut is now permanent, reducing API costs by 75% and strengthening DeepSeek’s position as one of the most aggressive low-cost providers for developers and AI agent builders.

DeepSeek V4 Pro Price Cut – Key Points

The Story

DeepSeek has permanently reduced API pricing for its flagship V4 Pro model to one quarter of its original level. The DeepSeek V4 Pro price cut follows a temporary discount that was previously set to expire at the end of May 2026.

The new pricing strengthens DeepSeek’s core market position: high-capability AI models offered at unusually low operating costs. It also places V4 Pro among the strongest models globally on an intelligence-per-dollar basis, according to third-party benchmark firm Artificial Analysis.

The Facts

DeepSeek has made a 75% V4 Pro price cut permanent.
V4 Pro API pricing now ranges from 0.025 to 6 yuan per million tokens, about $0.0035 to $0.83.
Standard V4 Pro pricing is listed at about $0.435 per million input tokens and $0.87 per million output tokens, with cached input priced as low as $0.003625 per million tokens.
Previous pricing ranged from 0.1 to 24 yuan per million tokens.
The reduction keeps V4 Pro at one quarter of its original price.
The temporary discount had been expected to end on May 31, 2026.
DeepSeek launched the V4 model family in April 2026, including V4 Pro and the cheaper V4 Flash.
DeepSeek has promoted V4 around long-context and agentic AI use cases, including a 1-million-token context length.
V4 Pro uses a Mixture-of-Experts architecture with 1.6 trillion total parameters and about 49 billion active parameters per forward pass.
Both V4 Pro and V4 Flash are open-weight models released under an MIT license.
Artificial Analysis ranked V4 Pro near the global frontier for cost efficiency after the price cut.
V4 Pro now costs about $268 to run Artificial Analysis’ Intelligence Index benchmark tests.
OpenAI’s GPT-5.5 and Anthropic’s Claude Opus 4.7 cost about 12 and 19 times more, respectively, to complete the same benchmark run.
DeepSeek previously said V4 Pro pricing was constrained by high-end compute capacity and could fall once Huawei Ascend 950 supernodes launched in larger quantities in the second half of 2026.

Why The Price Cut Matters For Developers

AI model pricing becomes critical when usage scales. A chatbot used occasionally by one person may not generate major costs. A production system used by thousands of users, or an internal enterprise agent processing large documents every day, can quickly become expensive.

The DeepSeek V4 Pro price cut could make several use cases more economical:

document analysis at scale
AI coding assistants
research and summarization workflows
customer support automation
long-context retrieval systems
multi-step AI agents
batch processing of large text datasets

The cache pricing is especially relevant for agentic and retrieval-based systems, where the same instructions, documents, or context blocks may be reused across many requests. Teams can also use V4 Flash for faster, cheaper routing or classification tasks and reserve V4 Pro for heavier reasoning and generation.

For smaller companies, the cut may reduce the barrier to building with a stronger model. For larger companies, it may create pressure to re-evaluate model routing, vendor contracts, and cost-per-task benchmarks.

The Huawei Chip Context

The price cut is also tied to the broader AI infrastructure race in China.

DeepSeek’s V4 model is optimized for Huawei’s Ascend AI chip ecosystem. U.S. export controls have limited China’s access to Nvidia’s most advanced AI chips, increasing the strategic importance of domestic alternatives such as Huawei Ascend.

DeepSeek did not disclose whether the permanent price reduction was caused by increased supply of Ascend 950 chips. Still, the company had previously said V4 Pro prices could fall sharply once Huawei Ascend 950 supernodes were launched in larger quantities.

DeepSeek’s architecture is designed to reduce compute and memory pressure through techniques such as Mixture-of-Experts routing and cache optimization. That makes the DeepSeek V4 Pro price cut more than a product update. It is also a signal about how Chinese AI companies are trying to compete under semiconductor constraints.

The Market Impact

DeepSeek’s pricing strategy adds pressure to the global AI model market.

OpenAI, Google, Anthropic, xAI, Meta, Mistral, and other model providers compete not only on intelligence, speed, multimodal capability, context length, safety, and developer experience, but also on cost. For many real-world applications, the winning model is not always the most capable model. It is the model that delivers acceptable performance at the best cost.

The DeepSeek V4 Pro price cut could push more developers to compare models by cost per completed task rather than headline benchmark scores alone. The growing focus on intelligence per dollar also gives lower-cost providers a clearer way to challenge premium-priced frontier models.

The pressure is likely to be strongest in high-volume background workloads, such as agent loops, batch processing, routing, code review, retrieval systems, and automated document analysis. Premium frontier models may remain preferred for highly sensitive or mission-critical tasks, but cheaper open-weight models can absorb more of the routine token volume.

What To Watch Next

The next test is whether DeepSeek can sustain the lower price while maintaining performance, reliability, and availability.

Three things matter now:

whether V4 Pro remains stable under higher demand
whether Huawei Ascend 950 infrastructure scales as expected
whether rival AI labs respond with lower pricing, cheaper model tiers, or more aggressive enterprise deals

If DeepSeek’s lower pricing holds, it could accelerate the AI industry’s shift toward cheaper inference, open-weight deployment, and more cost-sensitive model selection.

Why This Matters

Lower model prices can make advanced AI tools more accessible to developers, startups, and businesses that previously could not afford large-scale usage. The change also increases competitive pressure across the AI industry, where cost, deployment control, and model routing are becoming as important as raw model capability.

This article was drafted with the assistance of generative AI. All facts and details were reviewed and confirmed by an editor prior to publication.