Joint Coordinate Regression and Association For Multi-Person Pose Estimation, A Pure Neural Network Approach
This addresses the problem of efficient and accurate pose estimation for computer vision applications, representing a strong incremental improvement over existing bottom-up approaches.
The paper tackles multi-person 2D pose estimation by introducing JCRA, a one-stage end-to-end neural network that outputs human pose joints and associations without post-processing, achieving 69.2 mAP on benchmarks and being 78% faster than prior methods.
We introduce a novel one-stage end-to-end multi-person 2D pose estimation algorithm, known as Joint Coordinate Regression and Association (JCRA), that produces human pose joints and associations without requiring any post-processing. The proposed algorithm is fast, accurate, effective, and simple. The one-stage end-to-end network architecture significantly improves the inference speed of JCRA. Meanwhile, we devised a symmetric network structure for both the encoder and decoder, which ensures high accuracy in identifying keypoints. It follows an architecture that directly outputs part positions via a transformer network, resulting in a significant improvement in performance. Extensive experiments on the MS COCO and CrowdPose benchmarks demonstrate that JCRA outperforms state-of-the-art approaches in both accuracy and efficiency. Moreover, JCRA demonstrates 69.2 mAP and is 78\% faster at inference acceleration than previous state-of-the-art bottom-up algorithms. The code for this algorithm will be publicly available.