Alibaba’s Qwen Announces QVQ-72B-Preview: Beats GPT-4o on MathVista (OpenAI o3 Still Untested)

QvQ-72B-Preview is the latest AI model in a wave of releases comparing performance to OpenAI’s GPT-4o. While it claims superior benchmarks in math and visual reasoning, it notably omits comparisons to OpenAI’s advanced o3 model, which is currently in testing.

An AI scratching its head over a math puzzle diagram to illustrate that QVQ Beats GPT-4o on MathVista - Credit - The AI Track made by Freepik-Flux
An AI scratching its head over a math puzzle diagram to illustrate that QVQ Beats GPT-4o on MathVista - Credit - The AI Track made by Freepik-Flux

QVQ-72B-Preview – Key Points

  • Overview of QvQ-72B-Preview:
    • Developed by Qwen, the model builds on Qwen2-VL-72B and integrates vision and language processing for multimodal reasoning.
    • Focused on solving visual math and science problems but benchmarks are limited to comparisons with GPT-4o.
  • Performance Metrics (As Reported by Qwen):
    • MathVista: Scored 71.4, surpassing OpenAI’s o1 but leaving o3 unaddressed.
    • MMMU Benchmark: Claimed 70.3 score in multidisciplinary tasks.
    • Significant results in MathVision and OlympiadBench but still no evaluation against OpenAI’s latest advancements.
  • Features and Claims:
    • Designed for high academic rigor, reportedly excelling in interpreting graphs, diagrams, and equations for math and science problems.
    • Focuses on step-by-step analytical reasoning, leveraging datasets from real-world math competitions.
  • Known Limitations (Acknowledged by Qwen):
    • Language Mixing: Tendency to switch or mix languages in responses.
    • Recursive Reasoning Errors: Risk of verbose or circular logic without reaching conclusions.
    • Visual Reasoning Deficiencies: Gradual loss of focus on images during multi-step reasoning, leading to hallucinations.
    • Requires stronger safeguards for safety and ethical considerations.
  • Datasets Used for Evaluation:
    • MMMU: Tests multimodal reasoning across disciplines.
    • MathVista: Measures logical and scientific reasoning through visual aids.
    • MathVision: Includes diverse problems from math competitions.
    • OlympiadBench: Features bilingual problems with expert annotations from global science Olympiads.
  • What’s Next for QvQ-72B-Preview:
    • Plans to expand into a unified AI system integrating more modalities.
    • Aspires to enhance capabilities for deep thinking and reasoning, with visual information as a foundation.

Why This Matters:

The release of QvQ-72B-Preview highlights the ongoing race among AI developers to showcase superiority over OpenAI’s earlier GPT-4o model. However, by avoiding direct comparisons with OpenAI’s upcoming o3 reasoning models, QvQ-72B-Preview raises questions about its positioning in the competitive AI landscape. While advancements in multimodal reasoning are evident, the true benchmark lies in head-to-head evaluations with OpenAI’s latest models.

Everything you need to know about the AI war: Explore the competitive AI landscape and learn how leading companies and nations are shaping the future with advanced AI technologies

Read a comprehensive monthly roundup of the latest AI news!

The AI Track News: In-Depth And Concise

Scroll to Top