CVApr 15, 2022

Detecting Violence in Video Based on Deep Features Fusion Technique

arXiv:2204.07443v118 citationsh-index: 21
Originality Incremental advance
AI Analysis

This addresses the need for efficient and accurate violence detection in public surveillance systems, reducing reliance on manual supervision, though it is incremental as it builds on existing CNN and LSTM methods.

The paper tackled the problem of automatically detecting physical violence in surveillance videos by proposing a novel fusion technique of two CNNs (AlexNet and SqueezeNet) with ConvLSTM layers, achieving accuracies of 97%, 100%, and 96% on three benchmark datasets.

With the rapid growth of surveillance cameras in many public places to mon-itor human activities such as in malls, streets, schools and, prisons, there is a strong demand for such systems to detect violence events automatically. Au-tomatic analysis of video to detect violence is significant for law enforce-ment. Moreover, it helps to avoid any social, economic and environmental damages. Mostly, all systems today require manual human supervisors to de-tect violence scenes in the video which is inefficient and inaccurate. in this work, we interest in physical violence that involved two persons or more. This work proposed a novel method to detect violence using a fusion tech-nique of two significantly different convolutional neural networks (CNNs) which are AlexNet and SqueezeNet networks. Each network followed by separate Convolution Long Short Term memory (ConvLSTM) to extract ro-bust and richer features from a video in the final hidden state. Then, making a fusion of these two obtained states and fed to the max-pooling layer. Final-ly, features were classified using a series of fully connected layers and soft-max classifier. The performance of the proposed method is evaluated using three standard benchmark datasets in terms of detection accuracy: Hockey Fight dataset, Movie dataset and Violent Flow dataset. The results show an accuracy of 97%, 100%, and 96% respectively. A comparison of the results with the state of the art techniques revealed the promising capability of the proposed method in recognizing violent videos.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes