San Francisco, California – OpenAI has introduced GDPVal, a novel evaluation framework designed to assess the performance of artificial intelligence models on economically valuable, real-world tasks. The announcement, made by Srinivas Narayanan, Vice President of Engineering at OpenAI, highlights a significant shift towards grounding AI progress in tangible economic relevance rather than speculative capabilities.
"GDPVal - a new eval for how AI performs on real-world tasks," Srinivas Narayanan stated in a recent tweet. He further elaborated, sharing an OpenAI announcement that detailed GDPval as "a new evaluation that measures AI on real-world, economically valuable tasks. Evals ground progress in evidence instead of speculation and help track how AI improves at the kind of work that matters most."
The initial version of GDPVal encompasses tasks drawn from 9 sectors that contribute significantly to the U.S. Gross Domestic Product, covering 44 high-earning knowledge work occupations. Each task within the evaluation is meticulously constructed based on actual work products created by expert professionals. This approach aims to provide a clear, directly attributable measure of AI's abilities in economically critical domains.
Unlike traditional AI benchmarks, GDPVal employs head-to-head human expert comparison as its primary evaluation metric, given the complexity of automatically grading these intricate tasks. This methodology allows for continuous evaluation and comparison of model outputs against a human baseline, with the potential to integrate increasingly sophisticated AI models as new baselines in the future. The initiative underscores OpenAI's commitment to developing AI that not only demonstrates advanced capabilities but also delivers measurable economic value and practical utility across industries.