Karpathy Unveils "LLM Council" Project for Multi-Model AI Consensus

Image for Karpathy Unveils "LLM Council" Project for Multi-Model AI Consensus

Andrej Karpathy, a prominent figure in artificial intelligence, has introduced a novel project dubbed "LLM Council," designed to leverage multiple large language models (LLMs) and a "Chairman" model to synthesize comprehensive responses. The initiative, shared via a tweet from OpenRouter, aims to enhance the reliability and depth of AI interactions by fostering a collaborative evaluation process among different models.

The "LLM Council" operates as a local web application, utilizing OpenRouter to dispatch user queries to a selection of leading LLMs, including those from OpenAI, Google, Anthropic, and xAI. Each model initially provides an independent response. Subsequently, these individual outputs are cross-reviewed by each LLM, with identities anonymized to ensure objective ranking based on accuracy and insight.

Following the review stage, a designated "Chairman" LLM then aggregates and synthesizes the evaluated responses into a single, comprehensive final answer presented to the user. Karpathy described the project as a "fun Saturday hack," emphasizing its "vibe coded" nature and his intent to explore side-by-side LLM evaluation, particularly for tasks like reading books. He noted, "Code is ephemeral now and libraries are over, ask your LLM to change it in whatever way you like."

OpenRouter, a platform that facilitates access to various LLMs, plays a crucial role by routing the queries and managing the interactions between the council members. Users configure the participating LLMs and the Chairman model through a simple setup, requiring an OpenRouter API key. This approach allows for direct comparison and consensus-building among diverse AI perspectives.

Karpathy, known for his previous work as Director of AI at Tesla and projects like llm.c, continues to push the boundaries of LLM application and evaluation. The "LLM Council" offers a practical framework for users to gain more nuanced and thoroughly vetted information from the rapidly evolving landscape of large language models.