CVJun 18, 2024

Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

arXiv:2406.12235v269 citations
Originality Highly original
AI Analysis

This addresses the problem of biased and opaque anomaly detection in videos for security and surveillance applications, representing a novel interpretable technique.

The paper tackles biased and non-interpretable video anomaly detection by proposing Holmes-VAD, a framework that uses a new multimodal instruction-tuning dataset (VAD-Instruct50k) and a fine-tuned multimodal LLM to achieve accurate anomaly localization and generate explanations, with experimental results validating its generality and interpretability.

Towards open-ended Video Anomaly Detection (VAD), existing methods often exhibit biased detection when faced with challenging or unseen events and lack interpretability. To address these drawbacks, we propose Holmes-VAD, a novel framework that leverages precise temporal supervision and rich multimodal instructions to enable accurate anomaly localization and comprehensive explanations. Firstly, towards unbiased and explainable VAD system, we construct the first large-scale multimodal VAD instruction-tuning benchmark, i.e., VAD-Instruct50k. This dataset is created using a carefully designed semi-automatic labeling paradigm. Efficient single-frame annotations are applied to the collected untrimmed videos, which are then synthesized into high-quality analyses of both abnormal and normal video clips using a robust off-the-shelf video captioner and a large language model (LLM). Building upon the VAD-Instruct50k dataset, we develop a customized solution for interpretable video anomaly detection. We train a lightweight temporal sampler to select frames with high anomaly response and fine-tune a multimodal large language model (LLM) to generate explanatory content. Extensive experimental results validate the generality and interpretability of the proposed Holmes-VAD, establishing it as a novel interpretable technique for real-world video anomaly analysis. To support the community, our benchmark and model will be publicly available at https://holmesvad.github.io.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes