The field of Artificial Intelligence is witnessing a significant shift, with AI evaluations, or "evals," rapidly becoming a crucial component and a burgeoning industry. According to prominent tech observer Lenny Rachitsky, the trend of "Evals so hot right now" underscores their escalating importance in the development and refinement of AI systems. This development highlights a growing recognition that effective evaluation is paramount for the advancement and reliability of AI.
AI evaluations serve as structured methods to measure an AI system's performance on specific tasks, encompassing aspects like correctness, safety, and coherence. Rachitsky emphasizes that evals are "how you measure the quality and effectiveness of your AI system," acting as benchmarks that define what "good" looks like beyond simple latency checks. This contrasts with traditional software testing, as AI systems often exhibit non-deterministic behavior, requiring more qualitative and open-ended metrics.
The rising demand for robust AI evaluations has created a new market, with companies like Mercor experiencing rapid growth. Mercor, co-founded by Brendan Foody, has reportedly grown from $1 million to $500 million in revenue in just 17 months by providing expert-written AI evaluations and training data. This rapid expansion illustrates the critical need for specialized services to help AI labs and companies refine their models.
This trend also signifies a change in the skill sets required for product builders in the AI space. Rachitsky has noted that "writing evals is going to become a core skill for product managers," indicating that understanding and implementing effective evaluation methodologies is no longer a niche expertise but a fundamental requirement. The shift suggests that evals are evolving into "the new PRDs" (Product Requirements Documents), guiding the continuous improvement and strategic direction of AI products.