Cross-head Supervision for Crowd Counting with Noisy Annotations
This work addresses noisy annotations in crowd counting, a domain-specific issue, with an incremental improvement over existing methods.
The paper tackles the problem of noisy annotations in crowd counting datasets, which degrade model training, by proposing CHS-Net with cross-head supervision between convolution and transformer heads, achieving superior performance on ShanghaiTech and QNRF datasets.
Noisy annotations such as missing annotations and location shifts often exist in crowd counting datasets due to multi-scale head sizes, high occlusion, etc. These noisy annotations severely affect the model training, especially for density map-based methods. To alleviate the negative impact of noisy annotations, we propose a novel crowd counting model with one convolution head and one transformer head, in which these two heads can supervise each other in noisy areas, called Cross-Head Supervision. The resultant model, CHS-Net, can synergize different types of inductive biases for better counting. In addition, we develop a progressive cross-head supervision learning strategy to stabilize the training process and provide more reliable supervision. Extensive experimental results on ShanghaiTech and QNRF datasets demonstrate superior performance over state-of-the-art methods. Code is available at https://github.com/RaccoonDML/CHSNet.