Alibaba Qwen3 Max Rivals GPT-5

Key Takeaway:

Alibaba Cloud has unveiled Qwen3 Max, its largest AI model to date, boasting over 1 trillion parameters and outperforming or matching global leaders like OpenAI’s GPT-5, Google’s Gemini 2.5 Pro, and Anthropic’s Claude Opus 4 across multiple benchmarks. The release reflects Alibaba’s intensified investment in AI as a central business pillar, supported by billions in infrastructure funding.

Alibaba Launches Qwen3 Max – Key Points

Launch & Scale:
On September 24, 2025, ahead of the annual Apsara Conference in Hangzhou, Alibaba Cloud introduced Qwen3 Max, its most powerful AI model. It was trained on 36 trillion tokens with a 1 million token context length, enabling it to process entire codebases or long research documents in a single pass.
Benchmarks & Rankings:
- Ranked 3rd overall on LMArena, an AI evaluation platform created by UC Berkeley researchers, surpassing OpenAI GPT-5-Chat.
- Outperformed Claude Opus 4 and DeepSeek V3.1 on Tau2-Bench with a score of 74.8 for agent tool-calling.
- Scored 69.6 on SWE-Bench Verified for solving real-world coding problems, ahead of DeepSeek V3.1 but slightly below Claude Opus 4.
- Self-reported benchmarks show parity or superiority against GPT-5 Pro, Gemini 2.5 Pro, and Grok 4.
Capabilities:
Qwen3-Max shows strength in code generation and autonomous agent capabilities, reducing the need for human prompts compared to chatbots like ChatGPT. It demonstrates advanced reasoning, instruction following, multilingual comprehension (notably English and Chinese), and domain expertise. The model also reduces hallucinations and provides more reliable responses to open-ended questions.
Preview & Evolution:
A preview version released in early September ranked above OpenAI GPT-5-Chat on LMArena. Alibaba is also training Qwen3-Max-Thinking. When augmented with tool use and parallel test-time compute (incl. code interpreter), internal tests report 100% on AIME 25 and HMMT.
Corporate Investment & Strategy:
Alibaba has positioned AI as a core business priority, committing 380 billion yuan (US$53.4 billion) to AI-related infrastructure over the next three years. At the Apsara Conference, CEO Eddie Wu pledged even greater spending to meet rising demand, noting: “The speed of AI industry development has far exceeded our expectations.”
Access & Availability:
Unlike open-source alternatives, Qwen3-Max is not open-sourced but is accessible through Alibaba Cloud and freely available on the Qwen app (iOS & Android) and web platform.
The qwen3-max API is available in Alibaba Cloud Model Studio and is OpenAI-API compatible; developers can use standard OpenAI client patterns after creating an API key. Qwen3-Max is also directly accessible in Qwen Chat.
Additional Products (Omni & Ecosystem Updates):
Alongside Qwen3 Max, Alibaba unveiled Qwen3-Omni, a natively end-to-end omni-modal system (text, image, audio, video inputs; text and audio outputs) released open source under Apache 2.0 (downloadable on Hugging Face/GitHub and via an API “Flash” variant).
- Architecture: Thinker–Talker design using Mixture-of-Experts; Talker conditions directly on audio/visual features for more natural speech.
- Latency: Theoretical first-packet latencies of ~234 ms (audio) and ~547 ms (video) in streaming.
- Languages: 119 languages for text, 19 for speech input, 10 for speech output (incl. Cantonese).
- Context/limits: 65,536 tokens (Thinking Mode) / 49,152 (Non-Thinking); max input 16,384, max output 16,384; longest reasoning chain 32,768.
- Training pipeline: AuT audio encoder (~0.6B params) trained on ~20M hours (≈80% EN/ZH ASR, 20% other); ~2T multimodal tokens across text (~0.57T), audio (~0.77T), images (~0.82T), plus video/audio-video; post-training via SFT, strong-to-weak distillation, GSPO, and multi-stage speech training to reduce hallucinations.
- Benchmarks (select): AIME25 65.0 (vs GPT-4o 26.7), ZebraLogic 76.0 (vs Gemini 2.5 Flash 57.9), WritingBench 82.6 (vs GPT-4o 75.5); speech WER: Wenetspeech 4.69/5.89 (vs GPT-4o 15.30/32.27); vision/multimodal: HallusionBench 59.7, MMMU_pro 57.0, MathVision_full 56.3; video: MLVU 75.2 (vs Gemini 2.0 Flash 71.0, GPT-4o 64.6). Reported SOTA on 22/36 benchmarks and open-source lead on 32/36.
- API pricing (Flash variant): Inputs per 1K tokens—text $0.00025, audio $0.00221, image/video $0.00046; outputs—text $0.00096 (text-only input) or $0.00178 (if input includes image/audio); text+audio output $0.00876 per 1K (audio portion only). Free quota: 1M tokens for 90 days.
- Variants: Qwen3-Omni-30B-A3B in Instruct (Thinker+Talker), Thinking (reasoning, text output), and Captioner (audio captioning).
Strategic Positioning:
Alibaba positions Qwen3 Max as a direct challenger to U.S. AI leaders OpenAI, Google, and Anthropic, while bolstering China’s domestic AI ecosystem. The open-source Qwen3-Omni broadens developer adoption and enterprise integration, complementing the closed, cloud-served Qwen3-Max. This aligns with Beijing’s broader push for technological independence and leadership in global AI standards.

Why This Matters:

The launch of Qwen3-Max demonstrates China’s escalating commitment to AI supremacy. With trillion-parameter scale, cutting-edge benchmarks, and tens of billions in infrastructure investment, Alibaba is establishing itself as a counterweight to U.S. AI leaders. Its autonomous agent capabilities point toward next-generation automation that may reshape industries, while free public access via Alibaba Cloud accelerates adoption. The open-source Qwen3-Omni, combining real-time multimodality, competitive benchmarks, permissive licensing, and clear API economics, signals a pragmatic ecosystem strategy spanning closed high-end models and widely deployable open systems.

This article was drafted with the assistance of generative AI. All facts and details were reviewed and confirmed by an editor prior to publication.