CVMar 10, 2024

Platypose: Calibrated Zero-Shot Multi-Hypothesis 3D Human Motion Estimation

arXiv:2403.06164v23 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses the challenge of uncertainty in 3D pose estimation for applications like motion capture and robotics, though it is incremental as it builds on existing multi-hypothesis and motion estimation methods.

The paper tackles the problem of multi-hypothesis 3D human motion estimation from single-camera inputs, which involves generating multiple temporally consistent pose sequences to account for ambiguities, and introduces Platypose, a diffusion-based framework that achieves state-of-the-art calibration and competitive joint error on benchmarks like Human3.6M, MPI-INF-3DHP, and 3DPW.

Single camera 3D pose estimation is an ill-defined problem due to inherent ambiguities from depth, occlusion or keypoint noise. Multi-hypothesis pose estimation accounts for this uncertainty by providing multiple 3D poses consistent with the 2D measurements. Current research has predominantly concentrated on generating multiple hypotheses for single frame static pose estimation or single hypothesis motion estimation. In this study we focus on the new task of multi-hypothesis motion estimation. Multi-hypothesis motion estimation is not simply multi-hypothesis pose estimation applied to multiple frames, which would ignore temporal correlation across frames. Instead, it requires distributions which are capable of generating temporally consistent samples, which is significantly more challenging than multi-hypothesis pose estimation or single-hypothesis motion estimation. To this end, we introduce Platypose, a framework that uses a diffusion model pretrained on 3D human motion sequences for zero-shot 3D pose sequence estimation. Platypose outperforms baseline methods on multiple hypotheses for motion estimation. Additionally, Platypose also achieves state-of-the-art calibration and competitive joint error when tested on static poses from Human3.6M, MPI-INF-3DHP and 3DPW. Finally, because it is zero-shot, our method generalizes flexibly to different settings such as multi-camera inference.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes