DeepAgent Achieves 64% Success Rate on ToolBench, Revolutionizing AI Agent Tool Use

A new artificial intelligence model named DeepAgent has emerged, demonstrating significant advancements in how AI agents interact with external tools and manage complex, long-horizon tasks. Introduced by researchers, including Xiaoxi Li and others, on October 24, 2025, DeepAgent aims to overcome limitations of traditional AI frameworks by enabling autonomous tool discovery and execution within a single, coherent reasoning process. The model notably achieved a 64.0% success rate on the challenging ToolBench benchmark, surpassing previous top baseline scores of 54.0%.

DeepAgent's core innovation lies in its "autonomous memory folding" mechanism, designed to combat the context length explosion and error accumulation prevalent in long-duration tasks. This system compresses historical interactions into three distinct memory types: episodic for key milestones, working for current subgoals, and tool memory for successful tool applications. As stated in the tweet announcing the paper, "DeepAgent avoids this [context flooding] with memory folding that compresses history," enabling the agent to retain crucial information while reducing token usage and aiding recovery from errors.

The agent's proficiency is further enhanced by ToolPO, an end-to-end reinforcement learning strategy specifically tailored for tool use. This method employs an LLM-based tool simulator for stable and cost-effective training, and assigns credit precisely to the tokens forming tool names and arguments, sharpening tool calls. "Training uses ToolPO, a reinforcement method tailored to tool use," the announcement detailed, emphasizing its role in refining the agent's ability to master complex tool interactions.

DeepAgent consistently outperforms existing workflow agents across eight diverse benchmarks, including general tool-use tasks like API-Bank and ToolHop, and downstream applications such as ALFWorld and WebShop. Its ability to dynamically search for and utilize tools from large indexes, rather than being confined to fixed workflows or limited tool menus, signifies a major step towards more general and capable AI agents for real-world applications. The research paper, titled "DeepAgent: A General Reasoning Agent with Scalable Toolsets," is available on arXiv.