DENet: A Universal Network for Counting Crowd with Varying Densities and Scales
This addresses the open problem of accurate crowd counting in diverse scenarios, which is incremental as it builds on existing detection and density estimation approaches.
The paper tackles the problem of counting people or objects with varying scales and densities by proposing DENet, a network that combines detection and density estimation, achieving lower Mean Absolute Error on datasets like ShanghaiTech Part A, UCF, and WorldExpo'10 compared to state-of-the-art methods.
Counting people or objects with significantly varying scales and densities has attracted much interest from the research community and yet it remains an open problem. In this paper, we propose a simple but an efficient and effective network, named DENet, which is composed of two components, i.e., a detection network (DNet) and an encoder-decoder estimation network (ENet). We first run DNet on an input image to detect and count individuals who can be segmented clearly. Then, ENet is utilized to estimate the density maps of the remaining areas, where the numbers of individuals cannot be detected. We propose a modified Xception as an encoder for feature extraction and a combination of dilated convolution and transposed convolution as a decoder. In the ShanghaiTech Part A, UCF and WorldExpo'10 datasets, our DENet achieves lower Mean Absolute Error (MAE) than those of the state-of-the-art methods.