GPT-5 Achieves 74.9% on SWE-bench Verified, Redefining AI Coding Accuracy

San Francisco – OpenAI's newly released GPT-5, particularly its high-capacity variant, combined with the Codex CLI, is proving to be an exceptionally powerful tool for coding tasks, significantly reducing issues like hallucinations that plague other models. The model, launched in August 2025, represents a substantial advancement in artificial intelligence, setting new benchmarks in coding performance and reliability. Early reports highlight its ability to tackle complex software engineering challenges with unprecedented accuracy.

A user identified as Haider recently commented on the model's capabilities, stating, > "GPT-5-high + Codex CLI is extremely powerful for coding." This sentiment underscores a growing consensus among developers regarding the new AI's robust performance. Haider further contrasted GPT-5 with competitor models, noting, > "the biggest issue with Claude models and Claude Code is hallucinations," adding that when errors are pointed out, Claude models often respond with, "'You're absolutely right.'"

OpenAI reports that GPT-5 has achieved state-of-the-art scores in key coding benchmarks, including 74.9% on SWE-bench Verified and 88% on Aider polyglot, showcasing its prowess in real-world software engineering tasks and multi-language code editing. The company emphasized that GPT-5 meaningfully reduced sycophantic replies and hallucinations, a critical improvement for reliable code generation and problem-solving. This enhanced accuracy is attributed to improved training methodologies and a unified system that intelligently determines when to engage in deeper reasoning.

The integration of GPT-5 into platforms like GitHub Copilot and Visual Studio, often accessed through the Codex CLI, enables developers to leverage its advanced capabilities for tasks ranging from bug fixing and code editing to understanding complex codebases. Developers using GPT-5 have noted its intelligence and ease of steering, with some reporting that it achieves the best performance they've ever seen on internal benchmarks. This marks a significant step forward in AI-assisted software development, promising more efficient and reliable coding workflows.