ASearcher Boosts AI Search Performance by up to 46.7% with Long-Horizon Capabilities

Researchers led by Jiaxuan Gao have unveiled ASearcher, an open-source project designed to significantly advance the "Search Intelligence" of large language model (LLM)-based agents. The project, detailed in their new paper "Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL," addresses critical limitations in how AI agents currently perform complex web searches. The paper, shared by user "AK" on social media, highlighted the research as "Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL."

Existing open-source LLM agents often face limitations such as small turn limits, typically around 10, which restrict their ability to learn and execute complex, multi-step search strategies. This hinders their performance on challenging, knowledge-intensive tasks requiring extensive exploration and nuanced query resolution. Furthermore, a scarcity of large-scale, high-quality question-answer (QA) datasets has impeded effective reinforcement learning (RL) training for these agents.

ASearcher introduces a scalable, fully asynchronous RL training system that enables agents to perform "long-horizon search." This innovative approach allows for tool calls exceeding 40 turns and output tokens surpassing 150,000 during training, a significant leap from previous constraints. By decoupling trajectory execution from model updates, the system maintains high training efficiency and nearly full GPU utilization, overcoming bottlenecks common in traditional batch-generation RL systems.

A key component of ASearcher is its prompt-based LLM agent, which autonomously synthesizes high-quality and challenging QA pairs. This data synthesis agent iteratively modifies seed questions through "fuzzing" to increase uncertainty and "injection" to enrich context with external facts. This rigorous process generates a large-scale, diverse dataset, with 25,624 entries requiring external tools for resolution, addressing the critical need for robust training data.

The ASearcher-Web-QwQ agent, a 32B model, demonstrated substantial performance improvements through RL training, achieving 46.7% and 20.8% Avg@4 gains on the xBench and GAIA benchmarks, respectively. It attained impressive Avg@4 scores of 42.1 on xBench and 52.8 on GAIA, outperforming other open-source 32B agents. The project’s commitment to open-source principles includes releasing models, training data, and code, fostering further innovation in the field of agentic AI.