AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems

Boxuan Zhang, Jianing Zhu, Zeru Shi, Dongfang Liu, Ruixiang Tang

arXiv:2605.0871593.7

Predicted impact top 30% in CL · last 90 daysOriginality Highly original

AI Analysis

For developers deploying LLM-based multi-agent systems on long-horizon tasks, this work enables proactive failure detection during execution rather than post-hoc analysis, addressing a critical bottleneck in reliable deployment.

AgentForesight introduces an online auditing framework for LLM-based multi-agent systems that detects decisive errors at the earliest step during trajectory execution, enabling real-time intervention. On the AFTraj-2K benchmark and external Who&When benchmark, AgentForesight-7B outperforms GPT-4.1 and DeepSeek-V4-Pro by up to +19.9% with 3× lower step localization error.

LLM-based multi-agent systems are increasingly deployed on long-horizon tasks, but a single decisive error is often accepted by downstream agents and cascades into trajectory-level failure. Existing work frames this as \emph{post-hoc failure attribution}, diagnosing the responsible agent and step after the trajectory has ended. However, this paradigm forfeits any opportunity to intervene while trajectory is still unfolding. In this work, we introduce AgentForesight, a framework that reframes this problem as online auditing: at each step of an unfolding trajectory, an auditor observes only the current prefix and must either continue the run or alarm at the earliest decisive error, without access to future steps. To this end, we curate AFTraj-2K, a corpus of agentic trajectories across Coding, Math, and Agentic domains, in which safe trajectories are retained under a strict curation pipeline and unsafe trajectories are annotated at the step of their decisive error via consensus among multiple LLM judges. Built on that, we develop AgentForesight-7B, a compact online auditor trained with a coarse-to-fine reinforcement learning recipe that first equips it with a risk-anticipation prior at the failure boundary on adjacent safe/unsafe prefix pairs, then sharpens this prior into precise step-level localization under a three-axis reward jointly targeting the what, where, and who of an audit verdict. Across AFTraj-2K and an external Who\&When benchmark, AgentForesight-7B outperforms leading proprietary models, including GPT-4.1 and DeepSeek-V4-Pro, achieving up to +19.9% performance gain and 3$\times$ lower step localization error, opening the loop from post-hoc failures detection to enabling deployment-time intervention. Project page: https://zbox1005.github.io/agent-foresight/

View on arXiv PDF

Similar