3D human pose estimation from depth maps using a deep combination of poses
This work addresses the problem of accurate 3D human pose estimation for applications like human behavior understanding, though it is incremental as it builds on existing deep learning and prototype-based methods.
The paper tackles 3D human pose estimation from depth maps by proposing Deep Depth Pose (DDP), a ConvNet that linearly combines predefined 3D prototype poses to predict body joint positions, achieving state-of-the-art results on the ITOP and UBC3V datasets.
Many real-world applications require the estimation of human body joints for higher-level tasks as, for example, human behaviour understanding. In recent years, depth sensors have become a popular approach to obtain three-dimensional information. The depth maps generated by these sensors provide information that can be employed to disambiguate the poses observed in two-dimensional images. This work addresses the problem of 3D human pose estimation from depth maps employing a Deep Learning approach. We propose a model, named Deep Depth Pose (DDP), which receives a depth map containing a person and a set of predefined 3D prototype poses and returns the 3D position of the body joints of the person. In particular, DDP is defined as a ConvNet that computes the specific weights needed to linearly combine the prototypes for the given input. We have thoroughly evaluated DDP on the challenging 'ITOP' and 'UBC3V' datasets, which respectively depict realistic and synthetic samples, defining a new state-of-the-art on them.