Developer Highlights Massive Cached Token Reads in Claude Code, Raising Efficiency Questions

Jeffrey Emanuel, CEO of Pastel Network, recently brought attention to a significant efficiency challenge within Anthropic's Claude Code, an agentic coding tool. Emanuel reported an astonishing 8 billion cached tokens read for approximately 10 million tokens of output generated, leading to "Crazy RAM use." This observation, shared on social media, underscores the substantial memory demands and potential resource inefficiencies experienced by developers using large language models (LLMs) for complex coding tasks.

Claude Code, developed by Anthropic, is designed to enhance developer productivity by assisting with coding, understanding codebases, and managing workflows directly from the terminal. The tool leverages Claude LLMs, which process information in "tokens," with larger context windows allowing for the handling of extensive code and documentation. However, the reported disparity between output and cached reads suggests that the model is frequently re-accessing or re-processing large volumes of previously seen data.

High cached token usage can significantly impact operational costs and system performance for LLM users. Each token processed, whether input, output, or cached, contributes to computational load and associated expenses. Developers and organizations often grapple with optimizing token consumption, employing strategies such as maintaining lean code structures, providing explicit instructions, and breaking down large files to minimize redundant processing and manage costs.

Emanuel's insights carry weight given his extensive use of frontier LLMs in his professional capacity, stating he uses them "all day, every day, in about as intense a way as possible." His experience highlights a known industry challenge where LLMs, particularly when dealing with large codebases, can suffer from cluttered context windows and excessive token usage due to redundant output and context repetition. This ongoing issue prompts developers to seek more efficient methods for interacting with AI coding assistants to balance powerful capabilities with practical resource management.