CVFeb 23, 2020

Multi-Stream Networks and Ground-Truth Generation for Crowd Counting

Rodolfo Quispe, Darwin Ttito, Adín Ramírez Rivera, Helio Pedrini

arXiv:2002.09951v31.23 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses crowd counting for applications in surveillance and urban planning, but it is incremental as it builds on existing methods with architectural and ground truth improvements.

The paper tackles crowd counting in images by developing a Multi-Stream Convolutional Neural Network that produces density maps to estimate the number of people, and it proposes a hybrid ground truth generation method based on tiny face detection and scale interpolation, achieving superior results on datasets like UCF-CC-50 and ShanghaiTech.

Crowd scene analysis has received a lot of attention recently due to the wide variety of applications, for instance, forensic science, urban planning, surveillance and security. In this context, a challenging task is known as crowd counting, whose main purpose is to estimate the number of people present in a single image. A Multi-Stream Convolutional Neural Network is developed and evaluated in this work, which receives an image as input and produces a density map that represents the spatial distribution of people in an end-to-end fashion. In order to address complex crowd counting issues, such as extremely unconstrained scale and perspective changes, the network architecture utilizes receptive fields with different size filters for each stream. In addition, we investigate the influence of the two most common fashions on the generation of ground truths and propose a hybrid method based on tiny face detection and scale interpolation. Experiments conducted on two challenging datasets, UCF-CC-50 and ShanghaiTech, demonstrate that using our ground truth generation methods achieves superior results.

View on arXiv PDF Code

Similar