Alex L. Zhang Details Journey from AI Benchmarks to Recursive Language Models

Alex L. Zhang, a prominent PhD student at MIT CSAIL, recently shared insights into his research trajectory, highlighting his work on significant AI benchmarks, VideoGameBench and KernelBench, and his progression to Recursive Language Models (RLMs). Zhang, who previously studied at Princeton, conveyed his enthusiasm for the opportunity, stating, "Happy to have done this! Definitely don’t have the same level of credentials as some of the other guests but hopefully some of the insight about how I went from stuff like VideoGameBench and KernelBench to RLMs is useful for newer researchers!!!"

Zhang's contributions include VideoGameBench, a 2025 benchmark designed to evaluate Vision-Language Models (VLMs) on their ability to complete popular video games. This benchmark challenges models to interact directly with games using raw visual inputs, pushing the boundaries of VLM capabilities in perception, spatial navigation, and memory management. Its introduction marked a significant step in assessing AI's human-like intuitive skills.

Another key development from Zhang's research is KernelBench, an open-source framework released in 2025. This platform assesses Language Models' (LMs) proficiency in generating efficient and accurate GPU kernels, which are critical for high-performance machine learning architectures. KernelBench aims to automate the complex process of writing specialized GPU programs, a task traditionally requiring extensive expertise.

His latest work focuses on Recursive Language Models (RLMs), which represent an advanced approach to how language models process and understand context. RLMs allow LMs to programmatically interact with and recursively call themselves to answer sub-queries, offering a flexible and powerful method for handling complex information. This progression from specific benchmarks to more generalized model architectures underscores his evolving research interests.

Zhang's presentation, where he shared his journey, emphasizes the practical application of his research in guiding new AI researchers. His work spans evaluating language model capabilities, systems and GPU programming for machine learning, and AI for code generation, positioning him as a key figure in the next generation of AI innovation.