Leading AI researcher François Fleuret has voiced strong criticism regarding the fundamental design of auto-regressive models, commonly used in large language models. In a recent social media post, Fleuret stated, "And BTW I do think auto-regressive models are bad models, as they fit data in a very one-size-fits-all way that makes the economy of meaningful conditioning / factorization of latents and IMO misses the 'real' structures of distributions." This highlights a significant debate within the artificial intelligence community concerning the limitations of current generative AI architectures.
Auto-regressive models, which predict the next element in a sequence based on preceding ones, have been foundational to the success of models like GPT. However, their inherent sequential processing and fixed generation order can struggle to capture complex, non-linear dependencies and underlying "real" structures within data. Critics argue this "one-size-fits-all" approach can lead to inefficiencies and a superficial understanding of data distributions.
In response to these challenges, Fleuret, along with co-authors, has recently introduced a novel approach in an arXiv paper titled "𝜎-GPTs: A New Approach to Autoregressive Models." This research proposes breaking away from the traditional fixed, left-to-right generation order by employing "shuffled autoregression" and "double positional encodings." The new methodology allows for on-the-fly modulation of the generation order, enabling models to condition on arbitrary subsets of tokens.
The 𝜎-GPTs framework aims to facilitate more "meaningful conditioning" and "factorization of latents," directly addressing Fleuret's concerns about current models missing true data structures. This flexibility allows for advanced capabilities such as conditional density estimation, infilling, and token-based rejection sampling for burst generation. Such innovations could significantly improve efficiency by reducing the number of steps required for sequence generation.
This development underscores the ongoing effort to evolve AI architectures beyond the current limitations of Transformer-based autoregressive models, which often face high computational costs and difficulties with long-range dependencies. By enabling models to adapt their generation order and better understand underlying data structures, Fleuret's work contributes to the pursuit of more robust, efficient, and structurally aware artificial intelligence systems.