Inception Labs' New Diffusion LLM API Delivers Over 1,100 Tokens/Sec for Code Editing

Image for Inception Labs' New Diffusion LLM API Delivers Over 1,100 Tokens/Sec for Code Editing

Inception Labs, a company focused on developing a new generation of large language models, has announced the release of its first-ever API for "next-edit" suggestions, utilizing its innovative diffusion LLM technology. Co-founder Aditya Grover shared the news via tweet on August 20, 2025, highlighting the API's ability to provide "instant editing powered by diffusion LLMs." Code editing platform Continuedev has been identified as the launch partner, integrating the API to power its own next-edit feature.

"Instant editing powered by diffusion LLMs. At @InceptionAILabs, we are excited to release the first-ever API for next-edit suggestions. Developers and builders, go check it out! Thank you @continuedev for being our launch partner and using our API to power your next-edit feature." Grover stated in the announcement.

At the core of this advancement is Inception Labs' Mercury Coder, a diffusion-based large language model (dLLM) that operates fundamentally differently from traditional autoregressive LLMs. Unlike models that generate text token-by-token, diffusion LLMs refine entire passages iteratively, starting from a noisy version and progressively denoise it. This approach enables Mercury Coder to achieve remarkable speeds, with the Mercury Coder Mini model reportedly delivering up to 1,109 tokens per second on NVIDIA H100 GPUs, significantly outpacing many speed-optimized frontier models.

The company asserts that this speed does not compromise quality. On the Copilot Arena, a benchmark for code completion, Mercury Coder ranks as the fastest model and second in overall quality, demonstrating its effectiveness in real-world coding environments. This new API aims to provide developers and builders with real-time, multi-line suggestions, marking a significant step towards more responsive and efficient AI-powered coding assistance. The move underscores a potential paradigm shift in AI, promising faster and more cost-effective solutions for enterprise applications.

Inception Labs, founded by researchers from Stanford, UCLA, and Cornell, positions diffusion models as a unified framework for generative AI, capable of advanced reasoning and built-in error correction. This release extends the application of diffusion models, previously prominent in image and video generation, to text and code. The company's vision is to make high-quality AI solutions more accessible by addressing the latency and cost challenges associated with current LLM inference, particularly for complex tasks like code generation.