Zurich, Switzerland – Large Language Model (LLM) coding agents exhibit a notable preference for utilizing regular expressions (regex) in code generation, a tendency that, while seemingly efficient for pattern matching, can lead to significantly slower code execution. This observation, highlighted by software engineer Marcelo Calbucci in a recent social media post, points to a potential inefficiency in how these advanced AI tools approach certain programming tasks.
"LLM coding agents love regular expressions. It doesn't matter if it'll make your code 1000X slower," stated Marcelo Calbucci on social media.
This preference for regex by LLMs, despite its potential performance drawbacks, stems from their training on vast datasets where regex patterns are frequently used for string manipulation and validation. LLMs are adept at identifying and generating these patterns due to their inherent ability to recognize and replicate linguistic structures. However, for complex or large-scale data processing, regex can be computationally intensive, leading to the "1000X slower" performance noted by Calbucci.
Recent research underscores this performance disparity. A study published in medRxiv in June 2025, comparing regex and LLM-based approaches for extracting structured data from medical reports, found that while both achieved similar accuracy, the regex method was dramatically faster—up to 28,120 times quicker for the entire dataset. This highlights that for highly structured and standardized data extraction, traditional regex remains superior in efficiency.
Experts suggest that LLMs might favor regex because it represents a concise and powerful way to express complex pattern-matching logic, aligning with the models' strength in generating compact and semantically rich code snippets. However, this ease of generation does not always translate to optimal runtime performance. The challenge lies in guiding LLMs to consider performance implications alongside functional correctness.
The ongoing development of LLM coding agents aims to address such trade-offs. Future advancements may involve integrating more sophisticated performance evaluation mechanisms into the LLM's code generation process or training models with a stronger emphasis on computational efficiency. This would enable LLMs to generate code that is not only functionally correct but also optimized for speed, reducing reliance on potentially slow regex for performance-critical applications.