Composite Localization for Human Pose Estimation
This work addresses efficiency and accuracy challenges in human pose estimation for computer vision applications, representing an incremental improvement over existing methods.
The paper tackles inaccurate long-distance regression and high computational cost in human pose estimation by proposing a composite localization framework that divides the learning objective into simpler tasks, resulting in CLNet-ResNet50 outperforming SimpleBaseline by 1.14% with half the GFLOPs and CLNet-Hourglass outperforming stacked-hourglass by 4.45% on COCO.
The existing human pose estimation methods are confronted with inaccurate long-distance regression or high computational cost due to the complex learning objectives. This work proposes a novel deep learning framework for human pose estimation called composite localization to divide the complex learning objective into two simpler ones: a sparse heatmap to find the keypoint's approximate location and two short-distance offsetmaps to obtain its final precise coordinates. To realize the framework, we construct two types of composite localization networks: CLNet-ResNet and CLNet-Hourglass. We evaluate the networks on three benchmark datasets, including the Leeds Sports Pose dataset, the MPII Human Pose dataset, and the COCO keypoints detection dataset. The experimental results show that our CLNet-ResNet50 outperforms SimpleBaseline by 1.14% with about 1/2 GFLOPs. Our CLNet-Hourglass outperforms the original stacked-hourglass by 4.45% on COCO.