An experimental large language model (LLM) developed by OpenAI has achieved gold medal-level performance at the 2025 International Mathematical Olympiad (IMO), marking a significant milestone in artificial intelligence capabilities. The achievement, announced by OpenAI research scientist Alexander Wei, demonstrates the model's ability to solve complex mathematical problems requiring human-like reasoning and intricate proof construction. This development has initiated discussions within the mathematical community regarding the comparability of AI and human performance in such demanding intellectual competitions.
OpenAI's model successfully solved five out of the six problems presented in the IMO 2025, securing a score of 35 out of a possible 42 points. The evaluation was conducted under strict competition conditions, mirroring those faced by human contestants: the model operated without internet access or external tools, read official problem statements, and generated natural language proofs. These proofs were subsequently graded by a panel of former IMO medalists, who reached a unanimous consensus on the scores.
This breakthrough is particularly notable as previous AI models have struggled to achieve even bronze medal levels in the IMO, which demands creative problem-solving beyond rote computation. Alexander Wei stated on X that the model "can craft intricate, watertight arguments at the level of human mathematicians," highlighting a significant leap in AI's reasoning abilities. The success underscores AI's growing capacity to tackle challenges traditionally considered exclusive to human intellect.
However, the achievement has prompted a nuanced response from prominent mathematicians, including Fields Medalist Terence Tao. As noted in a tweet by Rota, Tao has weighed in on the "supposed Gold from OpenAI at IMO," urging caution in making direct "apples-to-apples comparisons" between AI and human performance. Tao's perspective suggests that while impressive, the AI's method of problem-solving and the testing methodology might differ fundamentally from human competition, making direct equivalence challenging without further controlled tests.
OpenAI has clarified that the gold-medal performing LLM is an experimental research model and is not planned for public release in its current form for several months. This ongoing development signifies a rapid advancement in AI's ability to engage with high-level mathematical reasoning, potentially paving the way for future applications in scientific research and complex problem-solving.