MTCNET: Multi-task Learning Paradigm for Crowd Count Estimation
This work addresses the problem of accurate crowd counting for surveillance and public safety applications, presenting an incremental improvement over existing methods.
The paper tackles crowd count estimation by proposing MTCNet, a multi-task learning architecture that uses an auxiliary classification task to improve density estimation, achieving 5.8% and 14.9% lower MAE on the ShanghaiTech dataset and 10.5% lower MAE on the UCF_CC_50 dataset compared to state-of-the-art methods.
We propose a Multi-Task Learning (MTL) paradigm based deep neural network architecture, called MTCNet (Multi-Task Crowd Network) for crowd density and count estimation. Crowd count estimation is challenging due to the non-uniform scale variations and the arbitrary perspective of an individual image. The proposed model has two related tasks, with Crowd Density Estimation as the main task and Crowd-Count Group Classification as the auxiliary task. The auxiliary task helps in capturing the relevant scale-related information to improve the performance of the main task. The main task model comprises two blocks: VGG-16 front-end for feature extraction and a dilated Convolutional Neural Network for density map generation. The auxiliary task model shares the same front-end as the main task, followed by a CNN classifier. Our proposed network achieves 5.8% and 14.9% lower Mean Absolute Error (MAE) than the state-of-the-art methods on ShanghaiTech dataset without using any data augmentation. Our model also outperforms with 10.5% lower MAE on UCF_CC_50 dataset.