LGFeb 11

TVCACHE: A Stateful Tool-Value Cache for Post-Training LLM Agents

Abhishek Vijaya Kumar, Bhaskar Kataria, Byungsoo Oh, Emaad Manzoor, Rachee Singh

arXiv:2602.10986v11.4h-index: 10

Originality Incremental advance

AI Analysis

This addresses inefficiencies in post-training for LLM agents, offering a domain-specific optimization that is incremental by building on existing caching concepts with stateful adaptation.

The paper tackles the problem of slow external tool calls during RL post-training of LLM agents, which cause GPU idle time and increased costs, by introducing TVCACHE, a stateful tool-value cache that achieves up to 70% cache hit rates and reduces median tool call execution time by up to 6.9X without degrading reward accumulation.

In RL post-training of LLM agents, calls to external tools take several seconds or even minutes, leaving allocated GPUs idle and inflating post-training time and cost. While many tool invocations repeat across parallel rollouts and could in principle be cached, naively caching their outputs for reuse is incorrect since tool outputs depend on the environment state induced by prior agent interactions. We present TVCACHE, a stateful tool-value cache for LLM agent post-training. TVCACHE maintains a tree of observed tool-call sequences and performs longest-prefix matching for cache lookups: a hit occurs only when the agent's full tool history matches a previously executed sequence, guaranteeing identical environment state. On three diverse workloads-terminal-based tasks, SQL generation, and video understanding. TVCACHE achieves cache hit rates of up to 70% and reduces median tool call execution time by up to 6.9X, with no degradation in post-training reward accumulation.

View on arXiv PDF

Similar