Handgun detection using combined human pose and weapon appearance
This work addresses security threats by enhancing early detection of handguns in surveillance systems, representing an incremental advance over existing appearance-only methods.
The paper tackles handgun detection in CCTV footage by combining human pose and weapon appearance in a single deep learning architecture, resulting in an improvement of 4.23 to 18.9 AP points over the previous state-of-the-art method.
Closed-circuit television (CCTV) systems are essential nowadays to prevent security threats or dangerous situations, in which early detection is crucial. Novel deep learning-based methods have allowed to develop automatic weapon detectors with promising results. However, these approaches are mainly based on visual weapon appearance only. For handguns, body pose may be a useful cue, especially in cases where the gun is barely visible. In this work, a novel method is proposed to combine, in a single architecture, both weapon appearance and human pose information. First, pose keypoints are estimated to extract hand regions and generate binary pose images, which are the model inputs. Then, each input is processed in different subnetworks and combined to produce the handgun bounding box. Results obtained show that the combined model improves the handgun detection state of the art, achieving from 4.23 to 18.9 AP points more than the best previous approach.