CVNov 4, 2025

M2S2L: Mamba-based Multi-Scale Spatial-temporal Learning for Video Anomaly Detection

arXiv:2511.05564v11 citationsh-index: 20VCIP
Originality Incremental advance
AI Analysis

This addresses the need for efficient and accurate anomaly detection in video surveillance, though it appears incremental as it builds on existing methods with specific improvements.

The paper tackles video anomaly detection by proposing a Mamba-based multi-scale spatial-temporal learning framework, achieving 98.5%, 92.1%, and 77.9% frame-level AUCs on benchmark datasets while maintaining 20.1G FLOPs and 45 FPS inference speed.

Video anomaly detection (VAD) is an essential task in the image processing community with prospects in video surveillance, which faces fundamental challenges in balancing detection accuracy with computational efficiency. As video content becomes increasingly complex with diverse behavioral patterns and contextual scenarios, traditional VAD approaches struggle to provide robust assessment for modern surveillance systems. Existing methods either lack comprehensive spatial-temporal modeling or require excessive computational resources for real-time applications. In this regard, we present a Mamba-based multi-scale spatial-temporal learning (M2S2L) framework in this paper. The proposed method employs hierarchical spatial encoders operating at multiple granularities and multi-temporal encoders capturing motion dynamics across different time scales. We also introduce a feature decomposition mechanism to enable task-specific optimization for appearance and motion reconstruction, facilitating more nuanced behavioral modeling and quality-aware anomaly assessment. Experiments on three benchmark datasets demonstrate that M2S2L framework achieves 98.5%, 92.1%, and 77.9% frame-level AUCs on UCSD Ped2, CUHK Avenue, and ShanghaiTech respectively, while maintaining efficiency with 20.1G FLOPs and 45 FPS inference speed, making it suitable for practical surveillance deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes