AINov 14, 2025
Looking Forward: Challenges and Opportunities in Agentic AI ReliabilityLiudong Xing, Janet, Lin
This chapter presents perspectives for challenges and future development in building reliable AI systems, particularly, agentic AI systems. Several open research problems related to mitigating the risks of cascading failures are discussed. The chapter also sheds lights on research challenges and opportunities in aspects including dynamic environments, inconsistent task execution, unpredictable emergent behaviors, as well as resource-intensive reliability mechanisms. In addition, several research directions along the line of testing and evaluating reliability of agentic AI systems are also discussed.
SYOct 24, 2025
From Failure Modes to Reliability Awareness in Generative and Agentic AI SystemJanet, Lin, Liangwei Zhang
This chapter bridges technical analysis and organizational preparedness by tracing the path from layered failure modes to reliability awareness in generative and agentic AI systems. We first introduce an 11-layer failure stack, a structured framework for identifying vulnerabilities ranging from hardware and power foundations to adaptive learning and agentic reasoning. Building on this, the chapter demonstrates how failures rarely occur in isolation but propagate across layers, creating cascading effects with systemic consequences. To complement this diagnostic lens, we develop the concept of awareness mapping: a maturity-oriented framework that quantifies how well individuals and organizations recognize reliability risks across the AI stack. Awareness is treated not only as a diagnostic score but also as a strategic input for AI governance, guiding improvement and resilience planning. By linking layered failures to awareness levels and further integrating this into Dependability-Centred Asset Management (DCAM), the chapter positions awareness mapping as both a measurement tool and a roadmap for trustworthy and sustainable AI deployment across mission-critical domains.