DGPose: Deep Generative Models for Human Body Analysis
This work addresses the need for more flexible and interpretable generative models in computer vision for human body analysis, though it appears incremental as it builds on existing disentanglement concepts.
The authors tackled the problem of interpretability in deep generative models for human body analysis by proposing models that disentangle pose and appearance, enabling applications like pose-transfer without task-specific training. They demonstrated the merits of their Conditional-DGPose and Semi-DGPose models on benchmarks such as Human3.6M, ChictopiaPlus, and DeepFashion.
Deep generative modelling for human body analysis is an emerging problem with many interesting applications. However, the latent space learned by such approaches is typically not interpretable, resulting in less flexibility. In this work, we present deep generative models for human body analysis in which the body pose and the visual appearance are disentangled. Such a disentanglement allows independent manipulation of pose and appearance, and hence enables applications such as pose-transfer without specific training for such a task. Our proposed models, the Conditional-DGPose and the Semi-DGPose, have different characteristics. In the first, body pose labels are taken as conditioners, from a fully-supervised training set. In the second, our structured semi-supervised approach allows for pose estimation to be performed by the model itself and relaxes the need for labelled data. Therefore, the Semi-DGPose aims for the joint understanding and generation of people in images. It is not only capable of mapping images to interpretable latent representations but also able to map these representations back to the image space. We compare our models with relevant baselines, the ClothNet-Body and the Pose Guided Person Generation networks, demonstrating their merits on the Human3.6M, ChictopiaPlus and DeepFashion benchmarks.