LGAICLFeb 20, 2025

STeCa: Step-level Trajectory Calibration for LLM Agent Learning

arXiv:2502.14276v227 citationsh-index: 10Has CodeACL
Originality Incremental advance
AI Analysis

This addresses the challenge of robust task completion for LLM agents in complex environments, representing a novel method for a known bottleneck rather than a foundational advancement.

The paper tackles the problem of suboptimal action accumulation in long-horizon tasks for LLM-based agents by proposing STeCa, a framework that identifies and corrects suboptimal steps through step-level reward comparison and LLM-driven reflection, resulting in significant performance improvements over existing methods.

Large language model (LLM)-based agents have shown promise in tackling complex tasks by interacting dynamically with the environment. Existing work primarily focuses on behavior cloning from expert demonstrations or preference learning through exploratory trajectory sampling. However, these methods often struggle to address long-horizon tasks, where suboptimal actions accumulate step by step, causing agents to deviate from correct task trajectories. To address this, we highlight the importance of timely calibration and the need to automatically construct calibration trajectories for training agents. We propose Step-Level Trajectory Calibration (STeCa), a novel framework for LLM agent learning. Specifically, STeCa identifies suboptimal actions through a step-level reward comparison during exploration. It constructs calibrated trajectories using LLM-driven reflection, enabling agents to learn from improved decision-making processes. We finally leverage these calibrated trajectories with successful trajectories for reinforced training. Extensive experiments demonstrate that STeCa significantly outperforms existing methods. Further analysis highlights that timely calibration enables agents to complete tasks with greater robustness. Our code and data are available at https://github.com/WangHanLinHenry/STeCa.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes