Multi-Scale Attention Network for Crowd Counting
This work addresses crowd counting for surveillance and public safety, offering incremental improvements through novel attention and loss components.
The paper tackles the problem of crowd counting with varying scales by proposing a multi-branch scale-aware attention network that aggregates multi-scale density predictions using a soft attention mechanism and a scale-aware loss, achieving state-of-the-art results on four datasets, including a 25% error reduction on UCF-QNRF.
In crowd counting datasets, people appear at different scales, depending on their distance from the camera. To address this issue, we propose a novel multi-branch scale-aware attention network that exploits the hierarchical structure of convolutional neural networks and generates, in a single forward pass, multi-scale density predictions from different layers of the architecture. To aggregate these maps into our final prediction, we present a new soft attention mechanism that learns a set of gating masks. Furthermore, we introduce a scale-aware loss function to regularize the training of different branches and guide them to specialize on a particular scale. As this new training requires annotations for the size of each head, we also propose a simple, yet effective technique to estimate them automatically. Finally, we present an ablation study on each of these components and compare our approach against the literature on 4 crowd counting datasets: UCF-QNRF, ShanghaiTech A & B and UCF_CC_50. Our approach achieves state-of-the-art on all them with a remarkable improvement on UCF-QNRF (+25% reduction in error).