Streamlining Video Analysis for Efficient Violence Detection
This addresses violence detection for unmanned security systems and online content filtering, but appears incremental as it builds on existing 3D CNN methods.
The paper tackled automated violence detection in surveillance videos by classifying scenes as 'fight' or 'non-fight' using a 3D CNN-based model named X3D with pre-processing and clustering techniques, achieving effective distinction between violent and non-violent events.
This paper addresses the challenge of automated violence detection in video frames captured by surveillance cameras, specifically focusing on classifying scenes as "fight" or "non-fight." This task is critical for enhancing unmanned security systems, online content filtering, and related applications. We propose an approach using a 3D Convolutional Neural Network (3D CNN)-based model named X3D to tackle this problem. Our approach incorporates pre-processing steps such as tube extraction, volume cropping, and frame aggregation, combined with clustering techniques, to accurately localize and classify fight scenes. Extensive experimentation demonstrates the effectiveness of our method in distinguishing violent from non-violent events, providing valuable insights for advancing practical violence detection systems.