CVAIDec 15, 2023

SoloPose: One-Shot Kinematic 3D Human Pose Estimation with Video Data Augmentation

arXiv:2312.10195v12 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses inefficiencies and error propagation in existing models for researchers and practitioners in computer vision, though it appears incremental with hybrid innovations.

The paper tackles 3D human pose estimation from video by proposing SoloPose, a one-shot, many-to-many spatio-temporal transformer model, and achieves superior results compared to state-of-the-art methods on datasets like Human3.6M and an augmented Humans7.1M dataset.

While recent two-stage many-to-one deep learning models have demonstrated great success in 3D human pose estimation, such models are inefficient ways to detect 3D key points in a sequential video relative to one-shot and many-to-many models. Another key drawback of two-stage and many-to-one models is that errors in the first stage will be passed onto the second stage. In this paper, we introduce SoloPose, a novel one-shot, many-to-many spatio-temporal transformer model for kinematic 3D human pose estimation of video. SoloPose is further fortified by HeatPose, a 3D heatmap based on Gaussian Mixture Model distributions that factors target key points as well as kinematically adjacent key points. Finally, we address data diversity constraints with the 3D AugMotion Toolkit, a methodology to augment existing 3D human pose datasets, specifically by projecting four top public 3D human pose datasets (Humans3.6M, MADS, AIST Dance++, MPI INF 3DHP) into a novel dataset (Humans7.1M) with a universal coordinate system. Extensive experiments are conducted on Human3.6M as well as the augmented Humans7.1M dataset, and SoloPose demonstrates superior results relative to the state-of-the-art approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes