OpenAI's latest large language model, GPT-5, has successfully completed the classic video game Pokémon Red in approximately seven days, marking a significant milestone in AI's ability to navigate complex, goal-oriented environments. This achievement highlights the model's advanced decision-making and planning capabilities, completing the game in far fewer steps and less time than previous AI attempts. The accomplishment was notably shared by tech observer Haider. on social media, who remarked, "damn... impressive."
The performance metrics released indicate a substantial leap in efficiency. GPT-5 finished the game in just 6,470 steps, a dramatic improvement compared to its predecessor, o3, which required 18,184 steps for completion. This represents an efficiency gain of over 2.8 times. Furthermore, GPT-5's completion time of roughly seven days is less than half of o3's 15-day run, and it reportedly "beat Claude and Gemini by a big margin," according to Haider.'s tweet.
This feat places GPT-5 at the forefront of AI game-playing, a field increasingly used to benchmark the general intelligence and agentic capabilities of large language models. Other prominent AI models have also engaged in the challenge; Google's Gemini 2.5 Pro, for instance, completed Pokémon Blue in May 2025, though with acknowledged "dev interventions" and an "agent harness" to facilitate gameplay. Anthropic's Claude models have also been noted for their progress in playing Pokémon Red, utilizing extended thinking and agent training.
Playing Pokémon Red serves as a robust test for AI due to its requirements for spatial reasoning, long-term planning, and adaptability within a dynamic world. While not a formal academic benchmark, such challenges provide a public and relatable demonstration of an AI's capacity for complex problem-solving. Following this success, discussions have already begun regarding GPT-5's potential to tackle future challenges, with a "Pokemon Crystal Run" anticipated.