AINov 6, 2025

Detecting Silent Failures in Multi-Agentic AI Trajectories

arXiv:2511.04032v13 citationsh-index: 5
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of detecting subtle failures in non-deterministic multi-agent AI systems, providing datasets and benchmarks for future research, though it is incremental as it applies existing anomaly detection methods to a new domain.

The paper tackles the problem of silent failures like drift and cycles in multi-agent AI systems by introducing anomaly detection for agentic trajectories, achieving up to 98% accuracy with supervised methods on curated datasets of 4,275 and 894 trajectories.

Multi-Agentic AI systems, powered by large language models (LLMs), are inherently non-deterministic and prone to silent failures such as drift, cycles, and missing details in outputs, which are difficult to detect. We introduce the task of anomaly detection in agentic trajectories to identify these failures and present a dataset curation pipeline that captures user behavior, agent non-determinism, and LLM variation. Using this pipeline, we curate and label two benchmark datasets comprising \textbf{4,275 and 894} trajectories from Multi-Agentic AI systems. Benchmarking anomaly detection methods on these datasets, we show that supervised (XGBoost) and semi-supervised (SVDD) approaches perform comparably, achieving accuracies up to 98% and 96%, respectively. This work provides the first systematic study of anomaly detection in Multi-Agentic AI systems, offering datasets, benchmarks, and insights to guide future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes