Google's Ironwood TPU Achieves 10X Peak Performance, Ushering in New Era of AI Inference

Image for Google's Ironwood TPU Achieves 10X Peak Performance, Ushering in New Era of AI Inference

Google has announced the general availability (GA) of its 7th generation Tensor Processing Unit (TPU), codenamed Ironwood, marking a significant leap in AI acceleration technology. The new TPU delivers a 10X peak performance improvement over its predecessor, TPU v5p, and more than 4X better performance per chip for both training and inference workloads compared to TPU v6e (Trillium). This advancement positions Ironwood as Google's most powerful and energy-efficient custom silicon to date, designed to power "thinking, inferential AI models at scale."

Sundar Pichai, CEO of Google and Alphabet, stated in a tweet, "> Our 7th gen TPU Ironwood is coming to GA! It’s our most powerful TPU yet: 10X peak performance improvement vs. TPU v5p, and more than 4X better performance per chip for both training + inference workloads vs. TPU v6e (Trillium). We use TPUs to train + serve our own frontier models, including Gemini, and we’re excited to make the latest generation available to @googlecloud customers." This underscores Google's commitment to leveraging its own hardware for frontier AI models like Gemini.

Ironwood is purpose-built for the "age of inference," a shift towards AI models that proactively generate insights and interpretations rather than just providing real-time information. It scales up to 9,216 liquid-cooled chips within a superpod, offering a staggering 42.5 Exaflops of compute power, which is more than 24 times the compute power of the world's largest supercomputer, El Capitan. This massive parallel processing capability is crucial for handling complex Large Language Models (LLMs) and Mixture of Experts (MoEs).

The new TPU features enhanced SparseCore, increased High Bandwidth Memory (HBM) capacity of 192 GB per chip (6x that of Trillium), and dramatically improved HBM bandwidth at 7.37 TB/s. Its Inter-Chip Interconnect (ICI) bandwidth has also been boosted to 1.2 TBps bidirectional, facilitating efficient distributed training and inference at scale. These innovations aim to minimize data movement and latency, crucial for demanding AI workloads.

Leading AI companies are already adopting Ironwood. Anthropic, the developer of the Claude model family, plans to utilize up to one million TPUs, citing significant cost-to-performance gains. James Bradbury, Head of Compute at Anthropic, noted that Ironwood's improvements in inference performance and training scalability will help them "scale efficiently while maintaining the speed and reliability our customers expect." Lightricks is also deploying Ironwood to train and serve its LTX-2 multimodal system.

Alongside Ironwood, Google Cloud also introduced new Arm-based Axion CPUs, which are designed to complement the TPUs by handling general-purpose computing tasks and data preparation for AI workloads. This integrated "AI Hypercomputer" architecture aims to optimize hardware and software for the most demanding AI tasks, offering customers a comprehensive solution for their evolving AI needs.