
Andon Labs has announced that Moonshot AI's Kimi K2 is now the leading open-source model on the Vending Bench evaluation platform, following a recent rerun of benchmarks. This significant finding positions Kimi K2 as a frontrunner in the rapidly evolving landscape of artificial intelligence.
"Kimi K2 is now the best open source model on Vending-Bench," stated Lisan al Gaib, the individual who shared the evaluation results. This assessment comes from Andon Labs, an AI safety evaluation company known for its rigorous testing methodologies for frontier AI models.
Kimi K2, developed by Chinese AI startup Moonshot AI, is a state-of-the-art Mixture-of-Experts (MoE) language model with 32 billion activated parameters and a total of one trillion parameters. The model, often referred to as Kimi K2 Thinking in recent reports, has been meticulously optimized for agentic capabilities, demonstrating strong performance across a range of tasks including reasoning, coding, and tool utilization. Moonshot AI recently open-sourced the model, making its advanced capabilities widely accessible to researchers and developers.
Vending Bench is a simulated environment developed by Andon Labs designed to test AI agents' ability to manage a vending machine business over long operational horizons. The benchmark assesses models on complex problem-solving, inventory management, pricing strategies, and sustained coherent decision-making, with performance measured by average net worth achieved. Andon Labs' re-evaluation using Moonshot's native API confirmed Kimi K2's superior performance among open-source contenders.
Industry analysis further underscores Kimi K2's impact, with reports from early November 2025 indicating it has outperformed several proprietary models, including OpenAI's GPT-5 and Anthropic's Claude Sonnet 4.5, in key third-party benchmarks. This achievement marks a pivotal moment for open-source AI, demonstrating its capacity to rival and even surpass closed systems in high-end reasoning and coding tasks. Kimi K2 is available for use via API and can also be self-hosted, offering flexibility for integration into various applications.