Local Information Matters: A Rethink of Crowd Counting
This addresses crowd counting for surveillance and safety applications, offering a novel design principle that improves accuracy in dense scenes.
The paper tackles the problem of crowd counting by emphasizing local modeling capability, as individuals occupy small portions of images, and achieves state-of-the-art performance with an 8.7% improvement in MAE on a high-density subset.
The motivation of this paper originates from rethinking an essential characteristic of crowd counting: individuals (heads of humans) in the crowd counting task typically occupy a very small portion of the image. This characteristic has never been the focus of existing works: they typically use the same backbone as other visual tasks and pursue a large receptive field. This drives us to propose a new model design principle of crowd counting: emphasizing local modeling capability of the model. We follow the principle and design a crowd counting model named Local Information Matters Model (LIMM). The main innovation lies in two strategies: a window partitioning design that applies grid windows to the model input, and a window-wise contrastive learning design to enhance the model's ability to distinguish between local density levels. Moreover, a global attention module is applied to the end of the model to handle the occasionally occurring large-sized individuals. Extensive experiments on multiple public datasets illustrate that the proposed model shows a significant improvement in local modeling capability (8.7\% in MAE on the JHU-Crowd++ high-density subset for example), without compromising its ability to count large-sized ones, which achieves state-of-the-art performance. Code is available at: https://github.com/tianhangpan/LIMM.