Efficient Human Pose Estimation with Depthwise Separable Convolution and Person Centroid Guided Joint Grouping
This work addresses the problem of efficient and effective 2D human pose estimation for researchers and practitioners in computer vision, offering an incremental improvement in computational cost and accuracy.
This paper proposes a new ResBlock based on depthwise separable convolution, which can be further enhanced by mixed depthwise convolution, and applies it to human pose estimation. It also introduces a bottom-up multi-person pose estimation method that uses a rooted tree representation with person centroids to group joints. The methods achieve competitive accuracies on the MPII and LSP datasets with low computational costs.
In this paper, we propose efficient and effective methods for 2D human pose estimation. A new ResBlock is proposed based on depthwise separable convolution and is utilized instead of the original one in Hourglass network. It can be further enhanced by replacing the vanilla depthwise convolution with a mixed depthwise convolution. Based on it, we propose a bottom-up multi-person pose estimation method. A rooted tree is used to represent human pose by introducing person centroid as the root which connects to all body joints directly or hierarchically. Two branches of sub-networks are used to predict the centroids, body joints and their offsets to their parent nodes. Joints are grouped by tracing along their offsets to the closest centroids. Experimental results on the MPII human dataset and the LSP dataset show that both our single-person and multi-person pose estimation methods can achieve competitive accuracies with low computational costs.