CVMar 6, 2025

AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM

arXiv:2503.04504v30.139 citationsh-index: 3Has Code
AI Analysis85

It addresses the practical usability issue for users in video surveillance by enabling customization without requiring machine learning expertise or extensive data collection.

The study tackled the problem of video anomaly detection (VAD) by proposing a zero-shot customizable approach that uses user-defined text to detect abnormal events without retraining, achieving state-of-the-art performance on benchmarks like UBnormal and UCF-Crime.

Video anomaly detection (VAD) is crucial for video analysis and surveillance in computer vision. However, existing VAD models rely on learned normal patterns, which makes them difficult to apply to diverse environments. Consequently, users should retrain models or develop separate AI models for new environments, which requires expertise in machine learning, high-performance hardware, and extensive data collection, limiting the practical usability of VAD. To address these challenges, this study proposes customizable video anomaly detection (C-VAD) technique and the AnyAnomaly model. C-VAD considers user-defined text as an abnormal event and detects frames containing a specified event in a video. We effectively implemented AnyAnomaly using a context-aware visual question answering without fine-tuning the large vision language model. To validate the effectiveness of the proposed model, we constructed C-VAD datasets and demonstrated the superiority of AnyAnomaly. Furthermore, our approach showed competitive results on VAD benchmarks, achieving state-of-the-art performance on UBnormal and UCF-Crime and surpassing other methods in generalization across all datasets. Our code is available online at github.com/SkiddieAhn/Paper-AnyAnomaly.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes