Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation
This addresses the problem of accurately estimating human poses from images for applications like computer vision and robotics, representing an incremental improvement through hybrid model integration.
The paper tackled articulated human pose estimation in monocular images by proposing a hybrid architecture combining a deep Convolutional Network and a Markov Random Field, which significantly outperformed existing state-of-the-art techniques.
This paper proposes a new hybrid architecture that consists of a deep Convolutional Network and a Markov Random Field. We show how this architecture is successfully applied to the challenging problem of articulated human pose estimation in monocular images. The architecture can exploit structural domain constraints such as geometric relationships between body joint locations. We show that joint training of these two model paradigms improves performance and allows us to significantly outperform existing state-of-the-art techniques.