CRAIMay 18

Surviving the Unseen: Predictive Defense for Novel Multi-Turn Multimodal Attacks

arXiv:2605.1898813.4
Predicted impact top 40% in CR · last 90 daysOriginality Highly original
AI Analysis

For developers of autonomous agentic AI systems, this work offers a predictive defense against non-stationary, cross-modal attacks that current static defenses miss.

The paper addresses the vulnerability of Multimodal Large Language Models (MLLMs) to novel multi-turn multimodal attacks that evade turn-specific guardrails. It proposes the TRIAD framework, which models conversational flow as a continuous trajectory and uses survival prediction to detect malicious drift, providing a mathematically bounded expected time-to-failure under adversarial perturbations.

The expansion of Multimodal Large Language Models (MLLMs) and their integration into autonomous agentic workflows has introduced a non-stationary attack surface. Empirical observations indicate that adversaries employ progressive, cross-modal perturbations that evade turn-specific guardrails by distributing malicious intent across longitudinal conversational trajectories. Static defense mechanisms, constrained by the Markov property, evaluate inputs in isolation and fail to detect cumulative structural poisoning. To handle this limitation, this paper formulates safety verification as a dynamic survival prediction and trajectory dynamics problem. The Triple-tier Anomaly Defense (TRIAD) framework is proposed as a predictive model that maps multimodal and multi-turn conversational flow as a continuous trajectory. The framework integrates structural anomaly detection to monitor covariance shifts, a Ledoit-Wolf regularized Mahalanobis distance to monitor covariance shifts in high-dimensional spaces, and topological trajectory acceleration to differentiate benign creative exploration from continuous malicious drift. These kinematic and geometric features are integrated into a time-varying Cox Proportional Hazards model via a Bayesian Hidden Markov Model (HMM) feedback loop. Theoretical analysis demonstrates that the TRIAD framework provides a mathematically bounded expected time-to-failure under adversarial perturbations, ensuring that malicious acceleration diverges positively. This framework provides a computationally efficient, interpretable, and predictive safeguard for real-time agentic AI systems, establishing a rigorous foundation for continuous safety alignment without relying on empirical retraining.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes