Boosting Detection in Crowd Analysis via Underutilized Output Features
This work addresses the problem of accurate crowd analysis in dense scenarios for computer vision researchers and practitioners, offering an incremental improvement by refining existing detection models.
The paper tackles the poor performance of detection-based methods in dense crowds by leveraging underutilized output features like area size and confidence scores to improve crowd analysis. The proposed Crowd Hat module achieves significant gains, demonstrating the potential of detection-based methods across tasks such as crowd counting, localization, and detection.
Detection-based methods have been viewed unfavorably in crowd analysis due to their poor performance in dense crowds. However, we argue that the potential of these methods has been underestimated, as they offer crucial information for crowd analysis that is often ignored. Specifically, the area size and confidence score of output proposals and bounding boxes provide insight into the scale and density of the crowd. To leverage these underutilized features, we propose Crowd Hat, a plug-and-play module that can be easily integrated with existing detection models. This module uses a mixed 2D-1D compression technique to refine the output features and obtain the spatial and numerical distribution of crowd-specific information. Based on these features, we further propose region-adaptive NMS thresholds and a decouple-then-align paradigm that address the major limitations of detection-based methods. Our extensive evaluations on various crowd analysis tasks, including crowd counting, localization, and detection, demonstrate the effectiveness of utilizing output features and the potential of detection-based methods in crowd analysis.