GPT-3-Generated Text Indistinguishable to Humans, Outperforms in Specific Tasks, Research Indicates

Image for GPT-3-Generated Text Indistinguishable to Humans, Outperforms in Specific Tasks, Research Indicates

A recent tweet from user "kache" on July 11, 2025, sparked discussion by asserting that large language models (LLMs) like GPT-3, even from four years prior, had already surpassed human capabilities in tasks such as email routing. The tweet, which stated, "This is what people don't get about LLM performance. They have never measured it against human. Gpt3 from 4 years ago is better than humans at email routing," highlights a growing perception regarding AI's advanced abilities in text-based operations.

GPT-3, initially released by OpenAI in mid-2020, marked a significant leap in natural language processing with its vast parameter count and ability to generate highly coherent and contextually relevant text. Its introduction spurred extensive research into its performance across various linguistic tasks, often comparing its output directly against human benchmarks.

Supporting the notion of LLM text indistinguishability, a 2023 study published in Science Advances titled "AI model GPT-3 (dis)informs us better than humans" found that human participants struggled to differentiate between tweets generated by GPT-3 and those written by real Twitter users. The research indicated that GPT-3 could produce accurate information that was "easier to understand" and even "more compelling disinformation" than human-authored content, suggesting a qualitative superiority in certain text generation aspects.

Beyond indistinguishability, subsequent LLM iterations have demonstrated measurable superiority in specific domains. A November 2023 study in Scientific Reports showed GPT-4 outperforming human participants in medical multiple-choice questions, achieving an 82.4% accuracy compared to 75.7% for humans. Similarly, a 2023 Scientific Reports analysis revealed that ChatGPT-generated argumentative essays were rated significantly higher in quality than those written by human high school students. While some studies, like a 2024 comparison in Scientific Reports on coding, still show human students outperforming GPT-4, the trend points towards increasing AI proficiency.

The claim regarding email routing, a form of text classification or categorization, aligns with these broader findings in text-based tasks. The ability of GPT-3 to generate text indistinguishable from human writing and to produce more effective informational content suggests a foundational capability that could readily extend to accurately sorting and prioritizing emails based on their content and intent, tasks traditionally handled by human judgment.

These studies collectively underscore the rapid evolution of LLM capabilities, challenging traditional benchmarks of human performance in various text-centric applications. The assertion made by "kache" reflects a reality where AI, even in its earlier forms, began to demonstrate advanced proficiency in nuanced linguistic tasks, prompting a re-evaluation of human-AI collaborative roles.