CLApr 21

A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

Jincheng Ren, Siwei Wu, Yizhi Li, Kang Zhu, Shu Xu, Boyu Feng, Ruibin Yuan, Wei Zhang, Riza Batista-Navarro, Jian Yang, Chenghua Lin

arXiv:2604.1957244.7

Predicted impact top 11% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For researchers and practitioners building long-horizon terminal agents, TACO offers a generalizable method to reduce token costs without sacrificing performance, addressing a key bottleneck in deploying such agents.

TACO is a plug-and-play framework that automatically learns compression rules from interaction trajectories to reduce token costs in terminal-centric agent tasks. It improves performance by 1-4% on TerminalBench and reduces token overhead by ~10% while maintaining or boosting accuracy.

As model capabilities advance, research has increasingly shifted toward long-horizon, multi-turn terminal-centric agentic tasks, where raw environment feedback is often preserved in the interaction history to support future decisions. However, repeatedly retaining such feedback introduces substantial redundancy and causes cumulative token cost to grow quadratically with the number of steps, hindering long-horizon reasoning. Although observation compression can mitigate this issue, the heterogeneity of terminal environments makes heuristic-based or fixed-prompt methods difficult to generalize. We propose TACO, a plug-and-play, self-evolving Terminal Agent Compression framework that automatically discovers and refines compression rules from interaction trajectories for existing terminal agents. Experiments on TerminalBench (TB 1.0 and TB 2.0) and four additional terminal-related benchmarks (i.e., SWE-Bench Lite, CompileBench, DevEval, and CRUST-Bench) show that TACO consistently improves performance across mainstream agent frameworks and strong backbone models. With MiniMax-2.5, it improves performance on most benchmarks while reducing token overhead by around 10%. On TerminalBench, it brings consistent gains of 1%-4% across strong agentic models, and further improves accuracy by around 2%-3% under the same token budget. These results demonstrate the effectiveness and generalization of self-evolving, task-aware compression for terminal agents.

View on arXiv PDF

Similar