LGAIJan 29

Recoverability Has a Law: The ERR Measure for Tool-Augmented Agents

arXiv:2601.22352v1h-index: 1
Originality Highly original
AI Analysis

This provides a theoretical foundation for execution-level robustness in language agents, addressing a gap in understanding recoverability for researchers and practitioners.

The paper tackled the lack of a formal explanation for language model agents' self-recovery after tool call failures by proposing a predictive theory that recoverability follows a measurable law, quantified through Expected Recovery Regret (ERR) and validated empirically across benchmarks with predicted regret matching observed regret within delta ≤ 0.05.

Language model agents often appear capable of self-recovery after failing tool call executions, yet this behavior lacks a formal explanation. We present a predictive theory that resolves this gap by showing that recoverability follows a measurable law. To elaborate, we formalize recoverability through Expected Recovery Regret (ERR), which quantifies the deviation of a recovery policy from the optimal one under stochastic execution noise, and derive a first-order relationship between ERR and an empirical observable quantity, the Efficiency Score (ES). This yields a falsifiable first-order quantitative law of recovery dynamics in tool-using agents. We empirically validate the law across five tool-use benchmarks spanning controlled perturbations, diagnostic reasoning, and real-world APIs. Across model scales, perturbation regimes, and recovery horizons, predicted regret under the ERR-ES law closely matched observed post-failure regret measured from Monte Carlo rollouts, within delta less than or equal to 0.05. Our results reveal that recoverability is not an artifact of model scale or architecture, but a governed property of interaction dynamics, providing a theoretical foundation for execution-level robustness in language agents.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes