CVApr 23, 2021

Skeletor: Skeletal Transformers for Robust Body-Pose Estimation

arXiv:2104.11712v146 citations
Originality Incremental advance
AI Analysis

This work addresses robust body-pose estimation for applications in human-computer interaction, but it is incremental as it builds on existing transformer-based methods.

The paper tackles the problem of robust 3D human pose estimation from monoscopic video by addressing challenges like occlusion and ambiguity, resulting in improved performance on pose estimation and benefits for downstream tasks like sign language translation.

Predicting 3D human pose from a single monoscopic video can be highly challenging due to factors such as low resolution, motion blur and occlusion, in addition to the fundamental ambiguity in estimating 3D from 2D. Approaches that directly regress the 3D pose from independent images can be particularly susceptible to these factors and result in jitter, noise and/or inconsistencies in skeletal estimation. Much of which can be overcome if the temporal evolution of the scene and skeleton are taken into account. However, rather than tracking body parts and trying to temporally smooth them, we propose a novel transformer based network that can learn a distribution over both pose and motion in an unsupervised fashion. We call our approach Skeletor. Skeletor overcomes inaccuracies in detection and corrects partial or entire skeleton corruption. Skeletor uses strong priors learn from on 25 million frames to correct skeleton sequences smoothly and consistently. Skeletor can achieve this as it implicitly learns the spatio-temporal context of human motion via a transformer based neural network. Extensive experiments show that Skeletor achieves improved performance on 3D human pose estimation and further provides benefits for downstream tasks such as sign language translation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes