Attention-Enhanced Lightweight Hourglass Network for Human Pose Estimation
This work addresses the need for efficient pose estimation for applications like activity monitoring and human-robot interaction, but it is incremental as it builds on existing hourglass and attention methods.
The paper tackles the problem of computationally expensive human pose estimation by proposing a lightweight attention-based network that reduces parameters to 10% of the original eight-stack Hourglass network, achieving an average precision of 72.07 on COCO and MPII datasets with only 2.3M parameters and 3.7G FLOPs.
Pose estimation is a critical task in computer vision with a wide range of applications from activity monitoring to human-robot interaction. However,most of the existing methods are computationally expensive or have complex architecture. Here we propose a lightweight attention based pose estimation network that utilizes depthwise separable convolution and Convolutional Block Attention Module on an hourglass backbone. The network significantly reduces the computational complexity (floating point operations) and the model size (number of parameters) containing only about 10% of parameters of original eight stack Hourglass network. Experiments were conducted on COCO and MPII datasets using a two stack hourglass backbone. The results showed that our model performs well in comparison to six other lightweight pose estimation models with an average precision of 72.07. The model achieves this performance with only 2.3M parameters and 3.7G FLOPs.