Decoupled Sensitivity-Consistency Learning for Weakly Supervised Video Anomaly Detection
This addresses the problem of detecting anomalies in videos with weak supervision for applications like surveillance, offering a novel method to improve accuracy over existing approaches.
The paper tackled the sensitivity-stability trade-off in weakly supervised video anomaly detection by proposing DeSC, a decoupled framework with specialized streams for temporal sensitivity and semantic consistency, achieving state-of-the-art results of 89.37% AUC on UCF-Crime and 87.18% AP on XD-Violence.
Recent weakly supervised video anomaly detection methods have achieved significant advances by employing unified frameworks for joint optimization. However, this paradigm is limited by a fundamental sensitivity-stability trade-off, as the conflicting objectives for detecting transient and sustained anomalies lead to either fragmented predictions or over-smoothed responses. To address this limitation, we propose DeSC, a novel Decoupled Sensitivity-Consistency framework that trains two specialized streams using distinct optimization strategies. The temporal sensitivity stream adopts an aggressive optimization strategy to capture high-frequency abrupt changes, whereas the semantic consistency stream applies robust constraints to maintain long-term coherence and reduce noise. Their complementary strengths are fused through a collaborative inference mechanism that reduces individual biases and produces balanced predictions. Extensive experiments demonstrate that DeSC establishes new state-of-the-art performance by achieving 89.37% AUC on UCF-Crime (+1.29%) and 87.18% AP on XD-Violence (+2.22%). Code is available at https://github.com/imzht/DeSC.