CVFeb 1

Exposing and Defending the Achilles' Heel of Video Mixture-of-Experts

arXiv:2602.01369v13 citations
Originality Incremental advance
AI Analysis

This work addresses a critical security gap for video MoE models, which are widely used in applications like surveillance and autonomous systems, by exposing and mitigating both independent and collaborative weaknesses, though it is incremental as it builds on existing adversarial robustness research.

The paper tackles the adversarial robustness of Mixture-of-Experts (MoE) models in video understanding by proposing Temporal Lipschitz-Guided Attacks (TLGA) to expose component-level vulnerabilities, and Joint Temporal Lipschitz Adversarial Training (J-TLAT) to defend against them, reducing inference cost by over 60% and enhancing robustness across datasets and architectures.

Mixture-of-Experts (MoE) has demonstrated strong performance in video understanding tasks, yet its adversarial robustness remains underexplored. Existing attack methods often treat MoE as a unified architecture, overlooking the independent and collaborative weaknesses of key components such as routers and expert modules. To fill this gap, we propose Temporal Lipschitz-Guided Attacks (TLGA) to thoroughly investigate component-level vulnerabilities in video MoE models. We first design attacks on the router, revealing its independent weaknesses. Building on this, we introduce Joint Temporal Lipschitz-Guided Attacks (J-TLGA), which collaboratively perturb both routers and experts. This joint attack significantly amplifies adversarial effects and exposes the Achilles' Heel (collaborative weaknesses) of the MoE architecture. Based on these insights, we further propose Joint Temporal Lipschitz Adversarial Training (J-TLAT). J-TLAT performs joint training to further defend against collaborative weaknesses, enhancing component-wise robustness. Our framework is plug-and-play and reduces inference cost by more than 60% compared with dense models. It consistently enhances adversarial robustness across diverse datasets and architectures, effectively mitigating both the independent and collaborative weaknesses of MoE.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes