DeepSkeleton: Skeleton Map for 3D Human Pose Regression
This work addresses 3D human pose estimation for computer vision applications, presenting an incremental improvement with a novel intermediate representation.
The paper tackles the ill-posed depth ambiguity in 3D human pose estimation by introducing a skeleton map as an intermediate feature representation, achieving performance comparable to state-of-the-art methods on datasets like MPII and Human3.6M.
Despite recent success on 2D human pose estimation, 3D human pose estimation still remains an open problem. A key challenge is the ill-posed depth ambiguity nature. This paper presents a novel intermediate feature representation named skeleton map for regression. It distills structural context from irrelavant properties of RGB image e.g. illumination and texture. It is simple, clean and can be easily generated via deconvolution network. For the first time, we show that training regression network from skeleton map alone is capable of meeting the performance of state-of-theart 3D human pose estimation works. We further exploit the power of multiple 3D hypothesis generation to obtain reasonbale 3D pose in consistent with 2D pose detection. The effectiveness of our approach is validated on challenging in-the-wild dataset MPII and indoor dataset Human3.6M.