CVMay 21, 2025

Flashback: Memory-Driven Zero-shot, Real-time Video Anomaly Detection

arXiv:2505.15205v210 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses real-time and domain-independent anomaly detection for surveillance, with incremental improvements in performance.

The paper tackles video anomaly detection by proposing Flashback, a zero-shot and real-time method that uses an LLM to build a memory of captions and matches video segments via similarity search, achieving 87.3 AUC on UCF-Crime and 75.1 AP on XD-Violence.

Video Anomaly Detection (VAD) automatically identifies anomalous events from video, mitigating the need for human operators in large-scale surveillance deployments. However, two fundamental obstacles hinder real-world adoption: domain dependency and real-time constraints -- requiring near-instantaneous processing of incoming video. To this end, we propose Flashback, a zero-shot and real-time video anomaly detection paradigm. Inspired by the human cognitive mechanism of instantly judging anomalies and reasoning in current scenes based on past experience, Flashback operates in two stages: Recall and Respond. In the offline recall stage, an off-the-shelf LLM builds a pseudo-scene memory of both normal and anomalous captions without any reliance on real anomaly data. In the online respond stage, incoming video segments are embedded and matched against this memory via similarity search. By eliminating all LLM calls at inference time, Flashback delivers real-time VAD even on a consumer-grade GPU. On two large datasets from real-world surveillance scenarios, UCF-Crime and XD-Violence, we achieve 87.3 AUC (+7.0 pp) and 75.1 AP (+13.1 pp), respectively, outperforming prior zero-shot VAD methods by large margins.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes