SEMay 7

Characterizing Faults in Agentic AI: A Taxonomy of Types, Symptoms, and Root Causes

Mehil B Shah, Mohammad Mehdi Morovati, Mohammad Masudur Rahman, Foutse Khomh

arXiv:2603.0684764.62 citationsh-index: 49

AI Analysis

For developers and researchers building reliable agentic AI systems, this taxonomy offers a structured empirical foundation for fault diagnosis, though it is domain-specific and incremental.

The study analyzed 385 faults from 40 agentic AI repositories to derive a taxonomy of 34 fault types across four architectural dimensions, identifying common symptoms and root causes such as data schema mismatches and dependency drift. The taxonomy was validated by 145 practitioners and provides an empirical basis for diagnosing faults in agentic AI systems.

Agentic AI systems combine LLM-based reasoning, orchestration, tool invocation, and interaction with external environments. These systems introduce faults that are difficult to characterize using existing taxonomies. To address this gap, we present an empirical study of faults in agentic AI systems. We collected 13,602 issues and pull requests from 40 repositories and, using stratified sampling, selected 385 faults for analysis. Through grounded theory, we derived taxonomies of fault types, symptoms, and root causes. We then used Apriori-based association rule mining to identify relationships among faults, symptoms, and root causes, and validated the taxonomy through a developer study with 145 practitioners. Our analysis produced a taxonomy of 34 fault types, organized into four architectural dimensions. These faults manifested as failures in structured-output interpretation, tool calls, runtime execution, and exception handling, with root causes including data schema mismatches, dependency drift, state management complexity, and model interface instability. Furthermore, association rules showed recurring cross-component propagation, linking structured data, dependency, and state management faults to their symptoms and root causes. Practitioners considered the taxonomy representative of agentic AI failures and suggested refinements related to multi-agent coordination and observability. These findings provide an empirical basis for diagnosing faults and improving reliability in agentic AI systems.

View on arXiv PDF

Similar