GPT-5 (Reasoning, Medium) Reportedly Achieves 45% on New 'Hieroglyph' Benchmark, Leading Competitors by 20%

A recent post by X user BLCNYY claims that OpenAI's "GPT-5 (Reasoning, Medium)" has established a new State-of-the-Art (SoTA) on a newly introduced benchmark named Hieroglyph. According to the tweet, this specific variant of GPT-5 scored 45% on the benchmark, significantly outperforming its closest competitor.

The Hieroglyph benchmark is described as measuring "a model's ability to identify the link between seemingly unrelated and often niche subjects." While the tweet highlights its introduction, specific details or official documentation regarding this particular benchmark are not yet widely available in public AI research or news. This suggests it may be a very recent, internal, or highly specialized evaluation.

The reported performance of "GPT-5 (Reasoning, Medium)" aligns with ongoing industry anticipation for GPT-5, which is generally expected to be released in mid to late 2025. Discussions around GPT-5 often center on its potential for advanced, multi-step reasoning capabilities and a reduction in hallucinations, possibly incorporating modular or adaptive reasoning components.

The tweet further noted that "o4-mini-high" secured the second-best score on the Hieroglyph benchmark, achieving 25%. The o4-mini model is part of OpenAI's 'o-series,' which includes models like o3 and o4-mini, often described as focusing on improved reasoning and efficiency, serving as stepping stones toward the full GPT-5 release.

If confirmed, GPT-5's claimed 45% score represents a substantial lead of 20 percentage points over the second-best model on this novel reasoning test. This reported breakthrough underscores the rapid advancements in AI's capacity for complex, abstract problem-solving and the ongoing competitive landscape in the development of next-generation large language models.