Human Pose Estimation from RGB Input Using Synthetic Training Data
This work addresses pose estimation for computer vision applications, but it is incremental as it builds on existing methods like the Kinect estimator with a new training objective.
The paper tackles human pose estimation from RGB images by training a random forest classifier on synthetic data and using weakly labeled real images to improve generalization, achieving significant performance gains over a baseline on a public dataset.
We address the problem of estimating the pose of humans using RGB image input. More specifically, we are using a random forest classifier to classify pixels into joint-based body part categories, much similar to the famous Kinect pose estimator [11], [12]. However, we are using pure RGB input, i.e. no depth. Since the random forest requires a large number of training examples, we are using computer graphics generated, synthetic training data. In addition, we assume that we have access to a large number of real images with bounding box labels, extracted for example by a pedestrian detector or a tracking system. We propose a new objective function for random forest training that uses the weakly labeled data from the target domain to encourage the learner to select features that generalize from the synthetic source domain to the real target domain. We demonstrate on a publicly available dataset [6] that the proposed objective function yields a classifier that significantly outperforms a baseline classifier trained using the standard entropy objective [10].