CVAug 7, 2022

Jointformer: Single-Frame Lifting Transformer with Error Prediction and Refinement for 3D Human Pose Estimation

arXiv:2208.03704v130 citationsh-index: 36
Originality Incremental advance
AI Analysis

This work addresses the problem of accurate 3D human pose estimation from monocular images for applications in movement data analysis, though it is incremental as it builds on existing lifting approaches with transformer adaptations.

The paper tackles 3D human pose estimation from single images by proposing a transformer-based method that learns joint relationships via self-attention, incorporating intermediate supervision, residual connections, and error prediction, and reports outperforming recent state-of-the-art models by a large margin.

Monocular 3D human pose estimation technologies have the potential to greatly increase the availability of human movement data. The best-performing models for single-image 2D-3D lifting use graph convolutional networks (GCNs) that typically require some manual input to define the relationships between different body joints. We propose a novel transformer-based approach that uses the more generalised self-attention mechanism to learn these relationships within a sequence of tokens representing joints. We find that the use of intermediate supervision, as well as residual connections between the stacked encoders benefits performance. We also suggest that using error prediction as part of a multi-task learning framework improves performance by allowing the network to compensate for its confidence level. We perform extensive ablation studies to show that each of our contributions increases performance. Furthermore, we show that our approach outperforms the recent state of the art for single-frame 3D human pose estimation by a large margin. Our code and trained models are made publicly available on Github.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes