Evaluating Supervision Levels Trade-Offs for Infrared-Based People Counting
This work addresses the need for cost-effective and privacy-preserving people counting in applications like surveillance, though it is incremental as it builds on existing deep learning methods.
The paper tackles the problem of reducing annotation costs for infrared-based people counting by exploring weaker supervision levels, finding that a CNN image-level model achieves competitive accuracy with YOLO detectors and point-level models while offering higher frame rates and similar parameter counts.
Object detection models are commonly used for people counting (and localization) in many applications but require a dataset with costly bounding box annotations for training. Given the importance of privacy in people counting, these models rely more and more on infrared images, making the task even harder. In this paper, we explore how weaker levels of supervision can affect the performance of deep person counting architectures for image classification and point-level localization. Our experiments indicate that counting people using a CNN Image-Level model achieves competitive results with YOLO detectors and point-level models, yet provides a higher frame rate and a similar amount of model parameters.