
Berkeley, CA – On October 27 at 3:10 PM PT, the Agentic AI MOOC hosted a significant lecture by Sida Wang from Meta, focusing on "Predictable Noise in LLM Benchmarks from Millions of Prompts." This event coincided with the ongoing AgentX–AgentBeats Competition, a global challenge offering over $1 million in prizes, cloud credits, and API resources to advance the field of agentic AI. The competition is hosted by Berkeley RDI, building on the Agentic AI MOOC's substantial community of over 32,000 registered learners and approximately 13,000 Discord members.
The lecture by Meta's Sida Wang delved into critical issues surrounding the evaluation of large language models (LLMs), a topic central to the development and reliability of agentic AI systems. Understanding and mitigating "predictable noise" in benchmarks is crucial for accurately assessing AI agent performance and ensuring robust advancements in the technology. This academic insight provides a foundational understanding for participants in the concurrent competition.
The AgentX–AgentBeats Competition is structured in two phases, challenging participants to first "build novel benchmarks or enhance existing benchmarks for agentic AI (Phase 1), and then create AI agents to excel on them (Phase 2)." This initiative aims to address current limitations in AI evaluation, such as interoperability, reproducibility, and fragmentation, by fostering the creation of high-quality, standardized, and realistic agent evaluations as public goods. Major sponsors supporting the competition include Google DeepMind, Lambda, Nebius, Amazon, and Snowflake.
The competition leverages AgentBeats, an open-source platform designed for the standardized, reproducible, and competitive evaluation of LLM-based agents. This platform aims to create a unified ecosystem where researchers and developers can easily find relevant benchmarks, identify top-performing agents, and collaboratively shape the future standards of agentic AI. The initiative brings together builders, researchers, engineers, and AI enthusiasts worldwide to push the boundaries of this rapidly evolving technology.
Dawn Song, a professor at UC Berkeley and instructor for the Agentic AI MOOC, highlighted the importance of such community-driven efforts in her tweet, stating, "Join and sign up for the AgentX–AgentBeats Competition today. $1 Million+ in prizes, cloud credits, and API resources, a global challenge hosted by @BerkeleyRDI, building on the Agentic AI MOOC community of 32K+ registered learners and ~13K members on Discord, bringing together builders, researchers, engineers, and AI enthusiasts worldwide to build, benchmark, and push the boundaries of agentic AI." The competition's focus on creating robust evaluation methods is seen as essential for the responsible and effective development of AI agents capable of reasoning, acting, and interacting with the world.