MIT Researchers Develop AI Agents Predicting Human Behavior with Over 50% Higher Accuracy

Cambridge, MA – Researchers Benjamin S. Manning and John J. Horton from the Massachusetts Institute of Technology (MIT) have unveiled a novel approach to developing "general social agents" capable of predicting human behavior in complex, novel settings with significantly improved accuracy. Their findings, detailed in a recently published arXiv paper titled "[2508.17407] General Social Agents," demonstrate a substantial reduction in prediction errors, ranging from 53% to 73% compared to baseline AI models. The research was initially highlighted by Rohan Paul on social media.

The innovative method involves constructing AI agents that leverage theory-grounded natural language instructions, existing empirical data, and the extensive knowledge acquired by large language models (LLMs) during their training. This allows the agents to apply social science theories more flexibly across diverse scenarios without requiring ad hoc modifications for each new setting. The team designed a highly heterogeneous population of 883,320 novel games to rigorously test their agents' predictive capabilities.

In preregistered experiments, these AI agents, built using human data from a small set of conceptually related "seed" games, consistently outperformed traditional game-theoretic equilibria and standard "out-of-the-box" AI agents. A random sample of 1,500 games from the vast population showed the general agents' superior ability to predict human play. Notably, for a subset of separate novel games, these simulations even predicted responses from new human subjects more accurately than the most relevant published human data.

The researchers emphasize that this breakthrough could transform social science research and public policy by providing a reliable, scalable platform for testing theories before costly real-world implementation. "Useful social science theories predict behavior across settings," the authors state in their abstract, highlighting the core challenge their work addresses. The simulations were conducted using GPT-4o, demonstrating the potential of advanced LLMs when guided by robust theoretical frameworks.

The study underscores the importance of grounding AI subjects in established social science theories and validating them across distinct yet related datasets to ensure generalizability. This approach, supported by funding from Tyler Cowen and the Mercatus Center, suggests a future where AI-driven simulations can offer accurate insights into human decision-making across a wide array of complex social interactions.