DeepSeek Enthusiast Reports 92% GPQA Diamond Score Without Tools, Signaling Potential AI Leap

A social media post from a prominent DeepSeek enthusiast has sparked considerable interest in the artificial intelligence community, claiming a remarkable achievement in AI reasoning. On November 18, 2025, "Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)" announced, > "GPQA diamond ... 92% no tools, about as I expected... it is a small step change for the industry." This reported score, if officially confirmed for a DeepSeek model, would represent a significant advancement in AI's ability to tackle complex problems autonomously.

The Graduate-Level Google-Proof Q&A (GPQA) Diamond benchmark is widely recognized for its rigorous evaluation of AI systems on graduate-level scientific questions across biology, chemistry, and physics. Designed to be "Google-proof," the benchmark challenges models to demonstrate deep understanding and reasoning without relying on simple information retrieval. Human experts typically achieve around 70% accuracy on this demanding test.

The tweet specifically highlights the distinction of "92% no tools," contrasting it with a "90.8% - with Python" score. This emphasizes the model's intrinsic reasoning capabilities without external computational aids, suggesting a more profound level of intelligence. Historically, DeepSeek models have shown strong performance on GPQA Diamond, with DeepSeek V3.1 achieving approximately 79.3% and DeepSeek R1-0528 reaching around 81.3% in recent independent evaluations by the National Institute of Standards and Technology (NIST).

While a 92.4% score on GPQA Diamond was previously reported for Autopoiesis Sciences' Aristotle-X1 model in mid-2025, the enthusiast's claim points to a potential new milestone for DeepSeek. Such a breakthrough would place DeepSeek at the forefront of AI reasoning, indicating a rapid evolution in the field. The enthusiast's anticipation, "I expect we'll be shocked by its vision," underscores the perceived transformative potential of this development.

This reported performance, if verified, could accelerate advancements in various AI applications requiring sophisticated problem-solving and autonomous decision-making. The AI industry will be closely watching for further official announcements and technical details regarding this claimed achievement, as it could reshape expectations for future AI capabilities.