Unite the People: Closing the Loop Between 3D and 2D Human Representations
This work addresses the problem of limited labeled data for 2D human pose estimation in computer vision, enabling more detailed and scalable models for applications like animation or surveillance, though it is incremental as it builds on existing SMPLify methods.
The paper tackles the challenge of acquiring labeled data for training 2D human pose estimators by proposing a hybrid approach that uses an extended SMPLify method to generate high-quality 3D body model fits, leading to the creation of the UP-3D dataset with rich annotations. The result is a discriminative model that predicts 31 segments and 91 landmarks, achieving state-of-the-art 3D human pose and shape estimation with an order of magnitude less training data and without gender or pose assumptions.
3D models provide a common ground for different representations of human bodies. In turn, robust 2D estimation has proven to be a powerful tool to obtain 3D fits "in-the- wild". However, depending on the level of detail, it can be hard to impossible to acquire labeled data for training 2D estimators on large scale. We propose a hybrid approach to this problem: with an extended version of the recently introduced SMPLify method, we obtain high quality 3D body model fits for multiple human pose datasets. Human annotators solely sort good and bad fits. This procedure leads to an initial dataset, UP-3D, with rich annotations. With a comprehensive set of experiments, we show how this data can be used to train discriminative models that produce results with an unprecedented level of detail: our models predict 31 segments and 91 landmark locations on the body. Using the 91 landmark pose estimator, we present state-of-the art results for 3D human pose and shape estimation using an order of magnitude less training data and without assumptions about gender or pose in the fitting procedure. We show that UP-3D can be enhanced with these improved fits to grow in quantity and quality, which makes the system deployable on large scale. The data, code and models are available for research purposes.