Coarse- and Fine-grained Attention Network with Background-aware Loss for Crowd Density Map Estimation
This addresses crowd analysis for surveillance or public safety, with incremental improvements in accuracy and quality.
The paper tackled crowd density map estimation and people counting by proposing CFANet with a coarse-to-fine attention mechanism and a background-aware loss, resulting in outperforming previous state-of-the-art methods in count accuracy and improving map quality while reducing false recognition.
In this paper, we present a novel method Coarse- and Fine-grained Attention Network (CFANet) for generating high-quality crowd density maps and people count estimation by incorporating attention maps to better focus on the crowd area. We devise a from-coarse-to-fine progressive attention mechanism by integrating Crowd Region Recognizer (CRR) and Density Level Estimator (DLE) branch, which can suppress the influence of irrelevant background and assign attention weights according to the crowd density levels, because generating accurate fine-grained attention maps directly is normally difficult. We also employ a multi-level supervision mechanism to assist the backpropagation of gradient and reduce overfitting. Besides, we propose a Background-aware Structural Loss (BSL) to reduce the false recognition ratio while improving the structural similarity to groundtruth. Extensive experiments on commonly used datasets show that our method can not only outperform previous state-of-the-art methods in terms of count accuracy but also improve the image quality of density maps as well as reduce the false recognition ratio.