CVFeb 1

SRVAU-R1: Enhancing Video Anomaly Understanding via Reflection-Aware Learning

arXiv:2602.01004v1
Originality Highly original
AI Analysis

This addresses the need for deeper reasoning in video anomaly understanding for applications like surveillance and safety, though it is incremental as it builds on existing MLLM approaches.

The paper tackles the problem of shallow reasoning in video anomaly understanding by proposing SRVAU-R1, a reflection-aware learning framework that incorporates self-reflection and self-correction into multi-modal large language models. The result is significant improvements in temporal anomaly localization accuracy and reasoning quality across multiple benchmarks.

Multi-modal large language models (MLLMs) have demonstrated significant progress in reasoning capabilities and shown promising effectiveness in video anomaly understanding (VAU) tasks. However, existing MLLM-based approaches remain largely focused on surface-level descriptions of anomalies, lacking deep reasoning over abnormal behaviors like explicit self-reflection and self-correction. To address that, we propose Self-Reflection-Enhanced Reasoning for Video Anomaly Understanding (SRVAU-R1), a reflection-aware learning framework that incorporates reflection in MLLM reasoning. Specifically, SRVAU-R1 introduces the first reflection-oriented Chain-of-Thought dataset tailored for VAU, providing structured supervision with initial reasoning, self-reflection, and revised reasoning. Based on that, it includes a novel reflection-aware learning paradigm with supervised fine-tuning and reinforcement fine-tuning to enhance multi-modal reasoning for VAU. Extensive experiments on multiple video anomaly benchmarks demonstrate that SRVAU-R1 consistently outperforms existing methods, achieving significant improvements in both temporal anomaly localization accuracy and reasoning quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes