Alibaba’s Qwen Announces QVQ-72B-Preview

QvQ-72B-Preview is the latest AI model in a wave of releases comparing performance to OpenAI’s GPT-4o. While it claims superior benchmarks in math and visual reasoning, it notably omits comparisons to OpenAI’s advanced o3 model, which is currently in testing.

QVQ-72B-Preview – Key Points

Overview of QvQ-72B-Preview:
- Developed by Qwen, the model builds on Qwen2-VL-72B and integrates vision and language processing for multimodal reasoning.
- Focused on solving visual math and science problems but benchmarks are limited to comparisons with GPT-4o.
Performance Metrics (As Reported by Qwen):
- MathVista: Scored 71.4, surpassing OpenAI’s o1 but leaving o3 unaddressed.
- MMMU Benchmark: Claimed 70.3 score in multidisciplinary tasks.
- Significant results in MathVision and OlympiadBench but still no evaluation against OpenAI’s latest advancements.
Features and Claims:
- Designed for high academic rigor, reportedly excelling in interpreting graphs, diagrams, and equations for math and science problems.
- Focuses on step-by-step analytical reasoning, leveraging datasets from real-world math competitions.
Known Limitations (Acknowledged by Qwen):
- Language Mixing: Tendency to switch or mix languages in responses.
- Recursive Reasoning Errors: Risk of verbose or circular logic without reaching conclusions.
- Visual Reasoning Deficiencies: Gradual loss of focus on images during multi-step reasoning, leading to hallucinations.
- Requires stronger safeguards for safety and ethical considerations.
Datasets Used for Evaluation:
- MMMU: Tests multimodal reasoning across disciplines.
- MathVista: Measures logical and scientific reasoning through visual aids.
- MathVision: Includes diverse problems from math competitions.
- OlympiadBench: Features bilingual problems with expert annotations from global science Olympiads.
What’s Next for QvQ-72B-Preview:
- Plans to expand into a unified AI system integrating more modalities.
- Aspires to enhance capabilities for deep thinking and reasoning, with visual information as a foundation.

Why This Matters:

The release of QvQ-72B-Preview highlights the ongoing race among AI developers to showcase superiority over OpenAI’s earlier GPT-4o model. However, by avoiding direct comparisons with OpenAI’s upcoming o3 reasoning models, QvQ-72B-Preview raises questions about its positioning in the competitive AI landscape. While advancements in multimodal reasoning are evident, the true benchmark lies in head-to-head evaluations with OpenAI’s latest models.

Sources

QVQ: To See the World with Wisdom

The AI War: Everything You Need to Know About the Battle Shaping Our Future

Everything you need to know about the AI war: Explore the competitive AI landscape and learn how leading companies and nations are shaping the future with advanced AI technologies

Read a comprehensive monthly roundup of the latest AI news!