AI systems from Google DeepMind and OpenAI reached gold-medal performance at the 2025 International Math Olympiad (IMO), correctly solving 5 of 6 problems in natural language within the 4.5-hour time limit. This is the first time any AI has matched the top human scores in the world’s most difficult math competition, signaling a critical leap in abstract reasoning, problem-solving, and AI-human collaboration potential.
AI Scores at International Math Olympiad – Key Points
First Gold Medal-Level Performance by AI in IMO History
For the first time in IMO (International Math Olympiad) history, AI systems achieved gold-medal status—matching the top ~11% of human competitors. DeepMind’s AI scored 35 out of 42 points, a confirmed gold medal score, while OpenAI’s model reached a similar result under independent conditions.
Official Certification for DeepMind
The competition took place in Queensland, Australia. Google’s Gemini Deep Think, in its advanced form, was officially assessed by IMO graders. According to IMO President Dr. Gregor Dolinar, the AI’s solutions were “clear, precise, and easy to follow.” It became the first AI ever formally certified at gold level.
OpenAI’s Independently Validated System
OpenAI’s system was not part of the official entry but was evaluated by three former International Math Olympiad gold medallists. It mirrored official competition conditions, solving five out of six problems in 4.5 hours and demonstrating deep reasoning skills via natural language.
Subjects Covered and Problem Difficulty
Both models tackled questions spanning algebra, geometry, combinatorics, and number theory—the core IMO disciplines. These questions are known for requiring multi-step logic, originality, and abstract reasoning—areas where previous AIs failed or required days of computation.
Natural Language Reasoning Over Formal Code
Earlier systems like AlphaGeometry and AlphaProof used Lean code for formal proofs. This year, both Gemini Deep Think and OpenAI’s model solved problems end-to-end using natural language, improving accessibility, speed, and interpretability. The shift represents a move toward explainable AI reasoning.
DeepMind’s Architecture: Reinforcement Learning and Parallel Thinking
Gemini Deep Think was built with novel reinforcement learning techniques and trained on datasets containing multistep reasoning and theorem-proving examples. It used parallel thinking, enabling simultaneous exploration of multiple solution strategies. The model completed all solutions within the allotted IMO time—a massive speed improvement over last year’s two-to-three-day AlphaGeometry run.
Development Team and Future Release Plans
The DeepMind team was led by researchers including Thang Luong and Junehyuk Jung (visiting researcher from Brown University). CEO Demis Hassabis announced a controlled release: the model will first be tested by professional mathematicians before integration into Google AI Ultra, DeepMind’s premium research platform.
Breakthrough in Creative and Logical Cognition
The systems achieved performance not only through memorization or symbolic manipulation, but by demonstrating general-purpose reasoning, abstraction, and the ability to communicate logic through language—a key component for collaborative scientific work.
Caution from the Academic Community
Leading mathematicians such as Terence Tao (UCLA) and Geordie Williamson (University of Sydney) called for transparency, peer-reviewed documentation, and reproducibility before declaring the problem solved. Without open access to models and evaluation methods, they caution, such achievements remain unverified.
Verification and Alignment Risks
International Math Olympiad organizer Joseph Myers stressed that without formal proofs or verifiability mechanisms, even minor flaws in long-form AI outputs could lead to false assumptions in mathematics. Trustworthy AI in science must be paired with formal logic systems or transparent validation protocols.
Implications for Education and Collaboration
The news drew significant global attention, raising questions about the future of AI in education, scientific discovery, and human–AI collaboration. Commentators speculate that these systems could become tools to assist students, educators, and researchers in solving open-ended problems.
Why This Matters:
This achievement marks a major threshold in AI’s evolution. International Math Olympiad problems are intentionally designed to test human creativity and logic at its limits. Matching top human scores using natural language suggests AI is now capable of reasoning abstractly, solving hard problems under time pressure, and articulating insights in ways that humans can understand. These models could accelerate breakthroughs in research and education, provided concerns around transparency and verification are addressed.
OpenAI’s o3-pro delivers leading math and science performance, reduced API costs, and simulated reasoning features—but limitations remain in novel logic tasks.
Read a comprehensive monthly roundup of the latest AI news!






