Crowd Transformer Network
This addresses crowd density estimation for surveillance and public safety, but is incremental as it builds on existing feature-based methods.
The paper tackles crowd counting by combining local and non-local features using convolution and self-attention, achieving significant improvements on three public datasets.
In this paper, we tackle the problem of Crowd Counting, and present a crowd density estimation based approach for obtaining the crowd count. Most of the existing crowd counting approaches rely on local features for estimating the crowd density map. In this work, we investigate the usefulness of combining local with non-local features for crowd counting. We use convolution layers for extracting local features, and a type of self-attention mechanism for extracting non-local features. We combine the local and the non-local features, and use it for estimating crowd density map. We conduct experiments on three publicly available Crowd Counting datasets, and achieve significant improvement over the previous approaches.