Qwen's Flagship AI Models Achieve 1 Million Token Context, Boosting Inference Speed by Up to 3x

Qwen, a leading large language model developer, has announced a significant advancement in its Qwen3 series, enabling ultra-long context support of up to 1 million tokens for its Qwen3-30B-A3B-2507 and Qwen3-235B-A22B-2507 models. This breakthrough positions the models to process vast amounts of information, equivalent to entire books or extensive datasets, in a single interaction. According to a recent social media post from Qwen, > Qwen3-30B-A3B-2507 and Qwen3-235B-A22B-2507 now support ultra-long context—up to 1 million tokens!

The enhanced capabilities are powered by two core innovations: Dual Chunk Attention (DCA) and MInference. Dual Chunk Attention is described as a length extrapolation method designed to split long sequences into manageable chunks while preserving global coherence, crucial for maintaining understanding across extended texts. MInference, a sparse attention mechanism, further optimizes performance by reducing overhead and focusing on key token interactions.

These technological advancements are reported to significantly boost both generation quality and inference speed. Qwen stated that the models deliver "up to 3× faster performance on near-1M token sequences," a critical improvement for real-world applications requiring rapid processing of large inputs. The extended context and speed enhancements are expected to facilitate more complex reasoning, summarization, and data analysis tasks.

The new models are also designed for seamless integration into existing AI development workflows, being "Fully compatible with vLLM and SGLang for efficient deployment." This compatibility ensures that developers can readily leverage these advanced capabilities within popular inference frameworks, accelerating the adoption of ultra-long context models. The Qwen3-235B-A22B model, in particular, stands out as a flagship offering within the Qwen series, recognized for its competitive performance in various benchmarks.

This development builds upon Qwen's previous efforts in extending context windows, including the earlier Qwen2.5-1M series which also offered 1 million token support. The continued push for longer context lengths underscores an industry trend towards more capable and versatile large language models. This innovation is poised to open new avenues for applications in fields like legal research, scientific discovery, and enterprise knowledge management, where processing extensive documents is paramount.