Anthropic's Claude Opus 4.5 Achieves 80.9% on SWE-bench Verified, Redefining AI Coding Standards

Image for Anthropic's Claude Opus 4.5 Achieves 80.9% on SWE-bench Verified, Redefining AI Coding Standards

Anthropic has launched Claude Opus 4.5, positioning it as a significant leap forward in AI capabilities, particularly for programming and agentic tasks. The new model, released in late November 2025, is lauded for its enhanced logical reasoning, superior code generation, and remarkable autonomy, according to early reviews and company announcements. This release marks a pivotal moment in the competitive landscape of large language models.

A user identified as Cooper, testing the model, enthusiastically stated on social media, > "This is the best model release in a long long time when it comes to programming. It blows my mind how good it is." Cooper highlighted that Opus 4.5 no longer exhibits "gruesome logic errors," a common issue in previous models, and consistently demonstrates correct thinking when handling code. This improvement generalizes across all logical aspects of coding, virtually eliminating mistakes.

The model also addresses a critical challenge in AI-generated code: quality. Cooper noted, > "It no longer writes slop code! This is huge." Opus 4.5 is praised for producing elegant code and possessing the ability to refactor suboptimal code into high-quality solutions, indicating a deep understanding of codebase structure beyond mechanical refactoring. This capability is expected to prevent long-term issues associated with poorly structured AI-generated code.

Opus 4.5 demonstrates exceptional autonomy, independently creating minimal reproducible examples, bisecting errors, and fixing issues without getting sidetracked. Early testers, including Cooper, observed that it tackles problems directly and "DOES EXACTLY WHAT YOU SAY, WITHOUT CUTTING CORNERS!" This contrasts with previous models that often sought easier, less desirable solutions. The model's long context understanding is also described as "pretty much perfect," maintaining coherence over extended conversations.

Anthropic's official statements confirm Opus 4.5's leadership on SWE-bench Verified with an 80.9% score, surpassing competitors like Google's Gemini 3 Pro. The model is now available across Anthropic’s apps, API, and major cloud platforms, with significantly reduced API pricing at $5 per million input tokens and $25 per million output tokens. This strategic pricing aims to make frontier AI capabilities more accessible to a broader range of users and enterprises.