
University of California, Berkeley Professor Ben Recht has published a new blog post titled "Digitally Twinning" on his "argmin.net" platform, offering a fresh perspective on the justification of generative modeling and the principle of maximum likelihood. Recht, known for his critical analysis of machine learning foundations, asserts that generative models provide the "strongest argument" for maximum likelihood, albeit through a "brain-dead motivation." The post is a live blog of a lecture from his 2025 graduate machine learning class.
Generative models are defined as probabilistic simulators of data, designed to mimic observed behaviors. Recht explains that if the sole objective is to maximize the probability of collected data within a model, the resulting optimization problem is precisely maximum likelihood. This approach aims to find parameters that make the data most representative when sampled from the model.
Historically, Ronald Fisher, who introduced maximum likelihood, envisioned it as a method for summarizing data, not for generating simulations. Recht notes that statisticians spent a century seeking justifications for Fisher's idea, often linking it to concepts like KL divergence or entropy. However, he suggests that these justifications were driven by a desire to use models for analysis and inference about the world, leading to a "murky mess" in validating probabilistic models.
In contrast, Recht argues that generative modeling simplifies the rationale for maximum likelihood. "You just want to mimic data convincingly," he states, highlighting that if the goal is simulation, then maximizing the probability of observing the data becomes a straightforward objective. He maintains his skepticism about statistical models used merely as summaries, particularly for complex phenomena like human behavior, but acknowledges the practical utility of probabilistic simulations that "pass the interocular trauma test."