AWS Trainium3 Chip Boasts 4.4x Compute Performance, 4x Energy Efficiency for AI Training

Amazon Web Services (AWS) has publicly launched its Trainium3 custom artificial intelligence (AI) chip, which the company states is "four times as fast as its previous generation of artificial-intelligence chips," according to The Wall Street Journal. Unveiled at re:Invent 2025, the Trainium3, built on 3nm technology, is now generally available within Amazon EC2 Trn3 UltraServers, aiming to provide customers with faster and more cost-effective AI model training and inference.

The new Trainium3 UltraServers deliver significant performance enhancements, offering up to 4.4 times more compute performance, four times greater energy efficiency, and nearly four times more memory bandwidth compared to the Trainium2 generation. Each Trainium3 chip provides 2.52 petaflops (PFLOPs) of FP8 compute and features 144 GB of HBM3e memory with 4.9 TB/s of bandwidth, optimized for advanced data types and real-time AI tasks. This efficiency is crucial for managing the escalating energy demands of large-scale AI operations.

AWS designed the Trn3 UltraServers as vertically integrated systems, capable of housing up to 144 Trainium3 chips. These systems can be scaled further into EC2 UltraClusters 3.0, connecting thousands of UltraServers to support up to one million Trainium chips, a tenfold increase over the previous generation. This infrastructure, supported by the new NeuronSwitch-v1 and Neuron Fabric, is engineered to eliminate communication bottlenecks in distributed AI computing, enabling seamless data flow for complex AI workloads.

Early adopters are already reporting substantial benefits from Trainium3. Companies such as Anthropic, Karakuri, Metagenomi, NetoAI, Ricoh, and Splash Music have seen training and inference costs reduced by up to 50%. Amazon Bedrock, AWS's managed service for foundation models, is actively running production workloads on Trainium3. Decart, an AI lab specializing in generative video, has achieved four times faster frame generation at half the cost of GPUs, demonstrating the chip's capability for demanding real-time applications.

The launch of Trainium3 underscores AWS's strategic commitment to in-house silicon development, intensifying its competition with established leaders like Nvidia. AWS also previewed Trainium4, its next-generation chip, which is expected to deliver at least six times the processing performance in FP4 and four times more memory bandwidth. Notably, Trainium4 is designed to support Nvidia NVLink Fusion interconnects, allowing it to operate alongside Nvidia GPUs and AWS Graviton processors in MGX racks.