Former Intel CEO's New AI Benchmark Reveals Top Models Score Below 75% in Human Flourishing Alignment

Pat Gelsinger, former CEO of Intel, has launched a new initiative called Flourishing AI (FAI), a benchmark designed to measure how well artificial intelligence models align with core human values. Announced on July 10, 2025, in partnership with "faith tech" company Gloo, the FAI benchmark aims to shift the focus of AI evaluation beyond technical performance to encompass human well-being and societal benefit. This move marks a significant step in Gelsinger's post-Intel career, focusing on ensuring AI supports a flourishing humanity.

Gelsinger, who previously served as Intel's CEO for over 40 years, invested in Gloo a decade ago and later joined its board, taking an operational role in March. He stated in an interview with The New Stack that he has "lived at the intersection of faith tech my entire life," emphasizing his long-standing interest in the domain. The FAI benchmark is the first major project from his work at Gloo, reflecting his commitment to integrating human values into AI development.

The FAI benchmark draws its methodology from The Global Flourishing Study, a comprehensive survey on human well-being conducted by Harvard and Baylor University. Gloo adapted six core categories from this study: Character and Virtue, Close Social Relationships, Happiness and Life Satisfaction, Meaning and Purpose, Mental and Physical Health, and Financial and Material Stability. A seventh category, Faith and Spirituality, was added specifically for evaluating large language models (LLMs).

Initial test results from the FAI benchmark indicate that current leading AI models fall short of robust alignment with human flourishing. OpenAI's o3 achieved the highest score at 72 points, followed by Gemini 2.5 Flash Thinking with 68 points, Grok 3 at 67 points, and GPT-4.5 Preview with 66 points. Notably, none of the tested models met the desired 90-point threshold, with models particularly struggling in the Faith and Meaning categories, scoring in the 30s and 40s.

Gelsinger views these scores not as a failure, but as validation for the benchmark's necessity, arguing that "we’ve got a lot of work to do in these areas." He expressed hope that the FAI benchmark would encourage wider discussion within the AI community, ensuring that faith communities and human values are actively involved in shaping AI's future. His long-term goal is for all major AI models to achieve scores in the 90s, signifying a stronger alignment with human well-being.

The FAI benchmark deliberately focuses on human-centered outcomes and complements existing technical evaluations rather than replacing them. Its current scope does not address cultural variations, broader economic impacts like job displacement, environmental footprints, or emergent risks associated with AI at scale. Researchers note that the geometric mean scoring method used ensures that poor performance in any single dimension significantly impacts the overall score, preventing models from compensating weaknesses in one area with strengths in another.