DVD: A Comprehensive Dataset for Advancing Violence Detection in Real-World Scenarios
This dataset addresses the need for better generalization in violence detection models for researchers and practitioners, though it is incremental as it focuses on data rather than method innovation.
The authors tackled the problem of limited and poorly annotated datasets for violence detection by introducing DVD, a large-scale dataset with 500 videos and 2.7 million frames that includes frame-level annotations and diverse real-world conditions.
Violence Detection (VD) has become an increasingly vital area of research. Existing automated VD efforts are hindered by the limited availability of diverse, well-annotated databases. Existing databases suffer from coarse video-level annotations, limited scale and diversity, and lack of metadata, restricting the generalization of models. To address these challenges, we introduce DVD, a large-scale (500 videos, 2.7M frames), frame-level annotated VD database with diverse environments, varying lighting conditions, multiple camera sources, complex social interactions, and rich metadata. DVD is designed to capture the complexities of real-world violent events.