LiquidAI's LFM2 Model Achieves 188 Tokens/Second on M4 CPU, Outperforming Gemma3 for On-Device AI

Henry Ndubuaku announced a significant advancement in on-device artificial intelligence, revealing that LiquidAI's LFM2 350m-i8 model has been successfully ported to Cactus (YC S25) and achieves an impressive 188 tokens per second on an M4 CPU-ONLY system. This performance benchmark surpasses the Gemma3 270m-i8 model, which runs at 170 tokens per second under similar conditions, highlighting LFM2's efficiency for local processing.

The tweet further detailed the model's capability on mobile devices, stating, "On an old iPhone 13 Pro, it should near 100 tokens/sec, no NPU or GPU!" This emphasizes the LFM2 model's optimized design for efficient inference on edge devices, relying solely on the CPU without requiring dedicated AI accelerators or graphics processing units. LiquidAI positions LFM2 as a recommended model for phones, alongside Qwen, Gemma, and Smol.

LiquidAI has consistently focused on developing Liquid Foundation Models (LFMs) for efficient on-device deployment, aiming to deliver state-of-the-art performance with unmatched speed, quality, and memory efficiency. The company's LFM2 series is designed to unlock generative AI capabilities across a wide range of edge devices, including smartphones, laptops, and vehicles, by optimizing for millisecond latency, on-device resilience, and data privacy. LFM2 models are built on a hybrid architecture that combines multiplicative gates and short convolutions, enabling superior performance.

The compact size of these models is also a key factor for mobile integration. Ndubuaku noted that "Gemma weights compress well; 270m at i8 in Cactus format is 170mb," and projected that "LFM2 350m should compress below 250mb, so you can ship them in your apps without bloat!" This small memory footprint is crucial for embedding advanced AI functionalities directly into applications without significantly increasing their size or resource demands.

The collaboration between Google and Qualcomm has also been accelerating the adoption of Gemma models for on-device execution across Snapdragon platforms, aiming to democratize access to generative AI for a broad spectrum of edge devices. While the tweet highlights LFM2's strong performance on Apple's M4 CPU and older iPhone models, the broader industry trend points towards increasing optimization of AI models for local processing across various hardware ecosystems, reducing reliance on cloud infrastructure.