CVLGIVJul 14, 2020

Towards Dense People Detection with Deep Learning and Depth images

arXiv:2007.07171v1
AI Analysis

This addresses the problem of dense people detection for applications like surveillance or robotics, but it is incremental as it builds on existing DNN-based approaches with specific architectural improvements.

The paper tackles the problem of detecting multiple people from a single depth image using a DNN-based system that outputs a likelihood map for head positions, enabling 3D localization. It outperforms existing state-of-the-art methods, including in scenes with significant occlusions, and runs in real-time on low-budget GPUs.

This paper proposes a DNN-based system that detects multiple people from a single depth image. Our neural network processes a depth image and outputs a likelihood map in image coordinates, where each detection corresponds to a Gaussian-shaped local distribution, centered at the person's head. The likelihood map encodes both the number of detected people and their 2D image positions, and can be used to recover the 3D position of each person using the depth image and the camera calibration parameters. Our architecture is compact, using separated convolutions to increase performance, and runs in real-time with low budget GPUs. We use simulated data for initially training the network, followed by fine tuning with a relatively small amount of real data. We show this strategy to be effective, producing networks that generalize to work with scenes different from those used during training. We thoroughly compare our method against the existing state-of-the-art, including both classical and DNN-based solutions. Our method outperforms existing methods and can accurately detect people in scenes with significant occlusions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes