OpenAI Launches o3-pro: Its Most Advanced Reasoning Model Yet

OpenAI’s o3-pro model now sets a new standard in advanced AI reasoning, delivering top-tier performance in science, math, and coding benchmarks, while significantly reducing API costs. Despite these gains, o3-pro still relies on simulated reasoning, which can lead to confident but incorrect outputs, highlighting the limitations of current AI reasoning paradigms.

OpenAI Launches o3-pro - Credit OpenAI
OpenAI Launches o3-pro - Credit OpenAI

OpenAI Launches o3-pro – Key Points

  • Launch Date and Availability:

    o3-pro launched on June 10, 2025, and is now available for ChatGPT Pro and Team users, replacing o1-pro. Enterprise and Edu access begins the following week. The model is also accessible through the OpenAI developer API.

  • Pricing Update:

    OpenAI cut o3-pro’s API prices by 87% compared to o1-pro, now charging $20 per million input tokens and $80 per million output tokens. The standard o3 model’s API pricing was also reduced by 80%.

    • For reference:
      • o1-pro was $150 input / $600 output per million tokens.
      • o3-mini is $1.10 input / $4.40 output per million tokens.
      • 1M input tokens ≈ 750,000 words (roughly War and Peace in length).
  • Benchmark Superiority (Expanded):

    OpenAI published detailed benchmark results showcasing o3-pro’s performance:

    • AIME 2024 (Math): o3-pro: 93%, o3: 90%, o1-pro: 86%.
    • GPQA Diamond (PhD-level Science): o3-pro: 84%, o3: 81%, o1-pro: 79%.
    • Codeforces (Competitive Coding, Elo): o3-pro: 2748, o3: 2517, o1-pro: 1707.
  • 4/4 Reliability Benchmarks:

    In scenarios requiring four correct answers in a row, o3-pro outperformed:

    • AIME 2024: o3-pro – 90%, o3 – 80%, o1-pro – 80%.
    • GPQA Diamond: o3-pro – 76%, o3 – 67%, o1-pro – 74%.
    • Codeforces: o3-pro – 2301, o3 – 2011, o1-pro – 1423.
  • Human Comparative Evaluations:

    In blind testing by human reviewers:

    • Overall preference: 64% for o3-pro
    • Scientific analysis: 64.9%
    • Personal writing: 66.7%
    • Programming: 62.7%
    • Data analysis: 64.3%
  • Advanced Capabilities:

    o3-pro integrates a wide set of tools and functions:

    • Web search
    • Python code execution
    • Image and file analysis
    • Memory-based personalization
    • Uses chain-of-thought (CoT) processing for reasoning-like output.
  • Limitations and Technical Notes:

    • Slower responses than o1-pro due to expanded tool use and token output.
    • No image generation—for visual tasks, GPT-4o or o4-mini are recommended.
    • Canvas (OpenAI’s workspace) is not supported.
    • Temporary chats are currently disabled due to technical issues.
  • Simulated Reasoning: What It Actually Means:

    Ars Technica and academic studies clarify that “reasoning” in o3-pro does not reflect true logical thinking:

    • Simulated reasoning = more inference-time compute + chain-of-thought token planning.
    • Outputs are pattern-matched from training data, not built from logical inference.
    • Models can still confabulate (produce factual errors with confidence).
    • o3-pro often “thinks out loud” in tokens, offering clearer intermediate steps—but this doesn’t mean it can self-correct or recognize logical contradictions.
  • Known Weaknesses from Research:

    • Fails at novelty: Studies show poor performance on unfamiliar logic puzzles like Tower of Hanoi.
    • Fails with contradiction: Continues flawed approaches even when output is illogical.
    • Scaling paradox: Some models reduce reasoning effort as problems become more complex.
    • Even when armed with known algorithms, models don’t reliably apply them.
  • Future Directions and Mitigation Strategies:

    Researchers are working on:

    • Self-consistency sampling: Generate multiple solution paths to check for agreement.
    • Self-critique prompts: Encourage models to assess their own outputs.
    • Tool augmentation: Use external symbolic math engines or calculators to improve output fidelity.
    • These are early-stage fixes, not full solutions to the reasoning gap.
  • Model Philosophy & User Use Case:

    As The Neuron’s Corey Noles puts it:

    “o3‑Pro isn’t your everyday chat buddy—it’s the brainiac you summon when accuracy trumps speed.”

    o3-pro is suited for technical tasks, analysis, and problem-solving where clarity and logic structure matter more than quick response or friendly tone.

  • Safety and Transparency:

    o3-pro shares its safety and interpretability documentation with o3. Full disclosure is provided via the OpenAI o3 system card.


Why This Matters:

OpenAI’s o3-pro offers improved accuracy, lower pricing, and advanced tool use, marking it as a reliable choice for math, coding, and science tasks. However, the model’s reasoning is still simulated—not logical—and remains prone to patterned confabulation under novel conditions. While a strong tool for structured problems, it should be used with caution in high-stakes or unpredictable environments.

OpenAI has introduced ChatGPT Search enabling real-time information retrieval. This development positions ChatGPT as a direct competitor to Google.

Read a comprehensive monthly roundup of the latest AI news!

The AI Track News: In-Depth And Concise

Scroll to Top