CVMar 17, 2025

Language-guided Open-world Video Anomaly Detection under Weak Supervision

Zihao Liu, Xiaoyu Wu, Jianqin Wu, Xuxu Wang, Linlin Yang

arXiv:2503.13160v25 citationsh-index: 2Has Code

Originality Highly original

AI Analysis

This addresses the need for flexible anomaly detection in dynamic real-world applications, such as public health monitoring, by enabling variable definitions rather than assuming fixed ones, representing a novel paradigm shift rather than an incremental improvement.

The paper tackles the problem of video anomaly detection in open-world scenarios where anomaly definitions can change, by proposing a language-guided paradigm that adapts to user-provided natural language at inference time, achieving state-of-the-art performance in zero-shot experiments across seven datasets.

Video anomaly detection (VAD) aims to detect anomalies that deviate from what is expected. In open-world scenarios, the expected events may change as requirements change. For example, not wearing a mask may be considered abnormal during a flu outbreak but normal otherwise. However, existing methods assume that the definition of anomalies is invariable, and thus are not applicable to the open world. To address this, we propose a novel open-world VAD paradigm with variable definitions, allowing guided detection through user-provided natural language at inference time. This paradigm necessitates establishing a robust mapping from video and textual definition to anomaly scores. Therefore, we propose LaGoVAD (Language-guided Open-world Video Anomaly Detector), a model that dynamically adapts anomaly definitions under weak supervision with two regularization strategies: diversifying the relative durations of anomalies via dynamic video synthesis, and enhancing feature robustness through contrastive learning with negative mining. Training such adaptable models requires diverse anomaly definitions, but existing datasets typically provide labels without semantic descriptions. To bridge this gap, we collect PreVAD (Pre-training Video Anomaly Dataset), the largest and most diverse video anomaly dataset to date, featuring 35,279 annotated videos with multi-level category labels and descriptions that explicitly define anomalies. Zero-shot experiments on seven datasets demonstrate LaGoVAD's SOTA performance. Our dataset and code will be released at https://github.com/Kamino666/LaGoVAD-PreVAD.

View on arXiv PDF Code

Similar