GEPA Achieves Up to 35x Greater Efficiency in LLM Prompt Optimization

A new prompt optimizer named GEPA (Genetic-Pareto) has emerged, demonstrating significant advancements in the efficiency and performance of Large Language Models (LLMs). Announced by Shangyin Tan on social media, the development highlights the "great work" of Lakshya A Agrawal and their team, emphasizing GEPA's ability to learn from its environment and feedback. This innovative method promises to redefine how LLM prompts are optimized.

GEPA distinguishes itself by leveraging natural language reflection as a primary learning medium, a departure from traditional reinforcement learning (RL) methods that rely on sparse, scalar rewards. Instead, GEPA analyzes system-level trajectories, including reasoning steps and tool usage, to diagnose issues and refine prompts through natural language feedback. This reflective approach allows the system to understand why a prompt succeeded or failed and encode that insight into improvements.

The performance of GEPA is notably superior to existing optimization techniques. Across four different tasks, GEPA outperformed Group Relative Policy Optimization (GRPO) by an average of 10% and up to 20%, while requiring up to 35 times fewer rollouts. Furthermore, the new optimizer also surpassed MIPROv2, a leading prompt optimizer, by over 10% across two LLMs, showcasing its robust capabilities in enhancing LLM outputs.

Developed by a collaborative team of researchers from UC Berkeley, Stanford, and Databricks, GEPA's introduction marks a significant step in the field of artificial intelligence. Its ability to achieve substantial performance gains with considerably fewer computational resources suggests a more efficient and scalable future for LLM development and application. The method also shows promising results as an inference-time search strategy for code optimization.