CVApr 8, 2021

TokenPose: Learning Keypoint Tokens for Human Pose Estimation

arXiv:2104.03516v3366 citations
AI Analysis

This addresses the problem of explicitly learning anatomical constraints in pose estimation for computer vision applications, offering a lightweight alternative to existing methods.

The paper tackles human pose estimation by proposing TokenPose, which embeds each keypoint as a token to learn constraint relationships and appearance cues, achieving 72.5 AP and 75.8 AP on COCO validation with significant reductions in parameters and GFLOPs.

Human pose estimation deeply relies on visual clues and anatomical constraints between parts to locate keypoints. Most existing CNN-based methods do well in visual representation, however, lacking in the ability to explicitly learn the constraint relationships between keypoints. In this paper, we propose a novel approach based on Token representation for human Pose estimation~(TokenPose). In detail, each keypoint is explicitly embedded as a token to simultaneously learn constraint relationships and appearance cues from images. Extensive experiments show that the small and large TokenPose models are on par with state-of-the-art CNN-based counterparts while being more lightweight. Specifically, our TokenPose-S and TokenPose-L achieve $72.5$ AP and $75.8$ AP on COCO validation dataset respectively, with significant reduction in parameters ($\downarrow80.6\%$; $\downarrow$ $56.8\%$) and GFLOPs ($\downarrow$ $75.3\%$; $\downarrow$ $24.7\%$). Code is publicly available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes