DeepInfra has announced the immediate availability of Anthropic AI-compatible APIs, enabling developers to integrate Claude Code with a wide array of large language models (LLMs) hosted on its platform. This strategic move aims to provide developers with enhanced flexibility and cost savings, allowing them to utilize models such as DeepSeek V3.1, GLM-4.5, and Qwen3-Coder through a familiar API interface. The company highlighted "the cheapest token pricing + prompt-cache discounts" as a key benefit of this new offering.
The integration means that developers can leverage the powerful coding capabilities of Anthropic's Claude models while running them on DeepInfra's infrastructure, which supports a diverse ecosystem of AI models. This compatibility is designed to streamline development workflows, as DeepInfra's API endpoint is also compatible with OpenAI's Chat Completions standard, allowing for easier switching between different model providers. This approach addresses a common pain point for developers seeking to avoid vendor lock-in and optimize performance across various AI models.
DeepInfra's platform is known for providing access to a broad spectrum of open-source and proprietary LLMs, often at competitive rates. The new Anthropic API compatibility extends this offering, allowing users to tap into advanced models like Claude 4 Opus and Sonnet, developed by Anthropic for complex reasoning and advanced coding tasks. These models are lauded for their ability to understand context, follow intricate instructions, and maintain coherent conversations over extended interactions.
The announcement underscores a growing industry trend towards interoperability and cost-efficiency in the AI model deployment landscape. By offering prompt caching and batch processing, DeepInfra aims to significantly reduce operational costs for users, with potential savings of up to 90% through prompt caching. This initiative positions DeepInfra as a versatile platform for developers looking to build sophisticated AI applications without being constrained by specific API standards or high token costs.