Incremental Risk Assessment for Cascading Failures in Large-Scale Multi-Agent Systems
This work addresses risk assessment for cascading failures in large-scale multi-agent systems, offering incremental improvements in scalability and feasibility analysis.
The paper tackles the problem of quantifying cascading failure risk in time-delay consensus networks, such as multi-agent systems for temporal rendezvous, by developing a framework that uses Average Value-at-Risk to derive closed-form dependencies and fundamental lower bounds, achieving significant computational savings and tight theoretical limits.
We develop a framework for studying and quantifying the risk of cascading failures in time-delay consensus networks, motivated by a team of agents attempting temporal rendezvous under stochastic disturbances and communication delays. To assess how failures at one or multiple agents amplify the risk of deviation across the network, we employ the Average Value-at-Risk as a systemic measure of cascading uncertainty. Closed-form expressions reveal explicit dependencies of the risk of cascading failure on the Laplacian spectrum, communication delay, and noise statistics. We further establish fundamental lower bounds that characterize the best-achievable network performance under time-delay constraints. These bounds serve as feasibility certificates for assessing whether a desired safety or performance goal can be achieved without exhaustive search across all possible topologies. In addition, we develop an efficient single-step update law that enables scalable propagation of conditional risk as new failures are detected. Analytical and numerical studies demonstrate significant computational savings and confirm the tightness of the theoretical limits across diverse network configurations.