LG AIFeb 4, 2025

QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search

Zongyu Lin, Yao Tang, Xingcheng Yao, Da Yin, Ziniu Hu, Yizhou Sun, Kai-Wei Chang

arXiv:2502.02584v121.314 citationsh-index: 58Has CodeICML

Originality Incremental advance

AI Analysis

This addresses a bottleneck in language agent inference for complex interactive tasks, offering an incremental improvement over existing methods.

The paper tackles the problem of sub-optimal policies in language agents due to reliance on outcome rewards by proposing QLASS, which uses Q-guided stepwise search to provide intermediate guidance, resulting in significant performance improvements on complex interactive tasks and retaining strong performance with almost half the annotated data.

Language agents have become a promising solution to complex interactive tasks. One of the key ingredients to the success of language agents is the reward model on the trajectory of the agentic workflow, which provides valuable guidance during training or inference. However, due to the lack of annotations of intermediate interactions, most existing works use an outcome reward model to optimize policies across entire trajectories. This may lead to sub-optimal policies and hinder the overall performance. To address this, we propose QLASS (Q-guided Language Agent Stepwise Search), to automatically generate annotations by estimating Q-values in a stepwise manner for open language agents. By introducing a reasoning tree and performing process reward modeling, QLASS provides effective intermediate guidance for each step. With the stepwise guidance, we propose a Q-guided generation strategy to enable language agents to better adapt to long-term value, resulting in significant performance improvement during model inference on complex interactive agent tasks. Notably, even with almost half the annotated data, QLASS retains strong performance, demonstrating its efficiency in handling limited supervision. We also empirically demonstrate that QLASS can lead to more effective decision making through qualitative analysis. We will release our code and data.

View on arXiv PDF Code

Similar