Alibaba Qwen Team's New 2507 Thinking Model Achieves 92.3% on AIME25 Benchmark

Junyang Lin, a prominent researcher and engineer at Alibaba Group's DAMO Academy and head of development for the Qwen team, recently announced significant advancements in their artificial intelligence models. In a social media post, Lin highlighted the release of "instruct and thinking models of our smaller variant of the 2507 series, 30a3-2507," noting their enhanced speed and intelligence. He added, "I like this size, it is just something that i can easily play with, which is also somehow smart enough."

The announcement marks a strategic shift for Alibaba's Qwen team, moving away from hybrid reasoning models to specialized variants. This includes the Qwen3-235B-A22B-Instruct-2507, designed for direct answers and efficiency, and the Qwen3-235B-A22B-Thinking-2507, built for deep reasoning. This separation aims to prevent "token bloat," allowing users to choose between immediate responses or more complex, step-by-step reasoning.

The Qwen3-235B-A22B-Thinking-2507 model has demonstrated impressive performance in core reasoning areas. It achieved a leading score of 92.3% on the AIME25 benchmark, which evaluates problem-solving in mathematical and logical contexts, surpassing models like OpenAI's o4-mini and Gemini-2.5 Pro. Additionally, the instruct variant, Qwen3-235B-A22B-Instruct-2507, has shown strong results in non-reasoning tasks, outperforming models such as Claude Opus 4.

Lin also alluded to the upcoming release of a specialized coding model, stating, "btw, i hope we can shoot coder 30a3 tmr." This aligns with the Qwen3-Thinking model's strong coding capabilities, evidenced by its performance on the LiveCodeBench. The development of these models, particularly the "30a3" variant, leveraged advanced reinforcement learning techniques, utilizing over 20,000 parallel sandbox environments for continuous code-write-test-learn cycles. This approach enables the models to interact with environments, generate code, and refine their understanding.

These new open-source models from Alibaba are set to lower the barrier for enterprise and local deployment, providing an attractive alternative to commercial APIs. Their improved efficiency and benchmark performance are expected to significantly impact the open-source AI ecosystem, offering powerful tools for a wide range of applications, from complex reasoning to efficient instruction following.