Human Pose Estimation using Deep Consensus Voting
This work addresses pose estimation for computer vision applications, offering a novel approach that improves accuracy by leveraging full-image information rather than sparse keypoints.
The paper tackles human pose estimation from single images by introducing a deep consensus voting method that uses dense, multi-target votes from all image locations to predict keypoints and compute image-dependent joint probabilities, achieving competitive performance on MPII Human Pose and Leeds Sports Pose datasets.
In this paper we consider the problem of human pose estimation from a single still image. We propose a novel approach where each location in the image votes for the position of each keypoint using a convolutional neural net. The voting scheme allows us to utilize information from the whole image, rather than rely on a sparse set of keypoint locations. Using dense, multi-target votes, not only produces good keypoint predictions, but also enables us to compute image-dependent joint keypoint probabilities by looking at consensus voting. This differs from most previous methods where joint probabilities are learned from relative keypoint locations and are independent of the image. We finally combine the keypoints votes and joint probabilities in order to identify the optimal pose configuration. We show our competitive performance on the MPII Human Pose and Leeds Sports Pose datasets.