DUAL-VAD: Dual Benchmarks and Anomaly-Focused Sampling for Video Anomaly Detection
This work addresses the need for more comprehensive benchmarks in surveillance and public safety, though it is incremental in nature.
The paper tackled the problem of limited benchmarks in Video Anomaly Detection by introducing a softmax-based frame allocation strategy for anomaly-focused sampling and constructing dual benchmarks for frame-level and video-level tasks, resulting in demonstrated improvements on UCF-Crime at both levels with clear advantages over baselines.
Video Anomaly Detection (VAD) is critical for surveillance and public safety. However, existing benchmarks are limited to either frame-level or video-level tasks, restricting a holistic view of model generalization. This work first introduces a softmax-based frame allocation strategy that prioritizes anomaly-dense segments while maintaining full-video coverage, enabling balanced sampling across temporal scales. Building on this process, we construct two complementary benchmarks. The image-based benchmark evaluates frame-level reasoning with representative frames, while the video-based benchmark extends to temporally localized segments and incorporates an abnormality scoring task. Experiments on UCF-Crime demonstrate improvements at both the frame and video levels, and ablation studies confirm clear advantages of anomaly-focused sampling over uniform and random baselines.