Convolutional Neural Networks for Aerial Multi-Label Pedestrian Detection
This work addresses the problem of identifying specific individuals and actions in aerial images for surveillance or monitoring applications, but it appears incremental as it builds on existing methods like SSD without introducing major innovations.
The paper tackled pedestrian detection and action recognition in low-resolution aerial images by proposing a two-step framework using SSD for object proposals and a deep network to associate imagery with action labels, achieving unspecified performance improvements.
The low resolution of objects of interest in aerial images makes pedestrian detection and action detection extremely challenging tasks. Furthermore, using deep convolutional neural networks to process large images can be demanding in terms of computational requirements. In order to alleviate these challenges, we propose a two-step, yes and no question answering framework to find specific individuals doing one or multiple specific actions in aerial images. First, a deep object detector, Single Shot Multibox Detector (SSD), is used to generate object proposals from small aerial images. Second, another deep network, is used to learn a latent common sub-space which associates the high resolution aerial imagery and the pedestrian action labels that are provided by the human-based sources