
Google's latest large language model, Gemini 3 Pro, has reportedly achieved a significant lead in internal testing, scoring 72% on the Kilo code platform following its global rollout. The impressive performance places it well ahead of competing models, with Anthropic's Claude 4.5 Sonnet scoring 54% and OpenAI's GPT-5.1 Codex registering 18% in the same internal evaluations. The stark difference prompted developer Ashutosh Shrivastava to comment on social media, stating, > "LMAO, no competition at all.."
The availability of Gemini 3 Pro on Kilo code coincides with Google's official launch of the model on November 18, 2025, emphasizing its advanced reasoning, multimodal understanding, and agentic capabilities. Google aims for Gemini 3 Pro to enable developers to "bring any idea to life" through its enhanced ability to process and synthesize information across text, images, video, audio, and code. The model is also integrated into Google AI Studio, Vertex AI, and various third-party platforms.
This internal benchmark from Kilo code aligns with broader industry discussions and leaked performance data suggesting Gemini 3 Pro's strong competitive standing. Other benchmarks, such as GPQA Diamond, AIME 2025, and Video-MMU, have shown Gemini 3 Pro outperforming GPT-5.1 in several key areas. While Claude 4.5 Sonnet has demonstrated strong coding capabilities in some tests like SWE-Bench Verified, reports indicate it has struggled with consistency compared to Gemini 3.0 in code generation tasks.
The substantial performance gap highlighted by these internal scores underscores an intensifying race among AI developers to deliver the most capable and versatile models. Gemini 3 Pro's reported dominance in these coding-centric benchmarks could significantly influence its adoption among developers and its strategic position in the rapidly evolving AI landscape. The model's expanded capabilities are expected to drive innovation in complex workflow automation and real-time application development.