Redis LangCache Delivers Up to 90% Reduction in LLM API Costs and Latency

Redis has introduced LangCache, a new fully-managed semantic caching service designed to significantly reduce Large Language Model (LLM) token costs and improve application latency for AI developers. The announcement was highlighted by Redis CEO Rowan Trollope, who stated via social media, "Reduce your LLM Token costs and latency using the new @Redisinc LangCache." This service aims to optimize the performance and cost-efficiency of generative AI applications.

LangCache functions by storing and reusing previous LLM responses for similar queries, thereby minimizing the need for repetitive and costly API calls to LLMs. According to Redis, this intelligent semantic caching can lead to substantial savings, with reports indicating up to a 90% reduction in API costs and up to 15 times faster response times for cache hits compared to live inference. The service is accessible via a REST API, offering seamless integration into existing applications.

The new offering is part of Redis's broader strategy to enhance its platform for AI development, announced during events like Redis Released 2025. Alongside LangCache, Redis also unveiled vector sets, a new native data type for efficient vector similarity search, and introduced various integrations with AI agent frameworks like LangGraph. These developments collectively aim to provide a comprehensive real-time data architecture for building high-performing GenAI apps and agents.

Rowan Trollope emphasized that LangCache and vector sets simplify the complex data needs associated with agent-based AI applications. He noted that just as traditional apps require a cache for frequently accessed data, AI agents need fast access to data to make decisions efficiently. LangCache is available in public preview, allowing developers to implement the solution to manage LLM interactions more effectively and economically.