CVMar 30, 2022

Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation

arXiv:2203.16202v114 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of recovering plausible arm and hand dynamics from in-the-wild videos, which is important for applications in human motion capture, but it appears incremental as it builds on existing 2D and 3D pose estimation models.

The authors tackled the problem of estimating accurate arm twists and hand gestures from monocular video by leveraging arm-hand correlations, resulting in a method that outperforms previous state-of-the-art approaches with demonstrated robustness in challenging scenarios.

We propose an approach to estimate arm and hand dynamics from monocular video by utilizing the relationship between arm and hand. Although monocular full human motion capture technologies have made great progress in recent years, recovering accurate and plausible arm twists and hand gestures from in-the-wild videos still remains a challenge. To solve this problem, our solution is proposed based on the fact that arm poses and hand gestures are highly correlated in most real situations. To fully exploit arm-hand correlation as well as inter-frame information, we carefully design a Spatial-Temporal Parallel Arm-Hand Motion Transformer (PAHMT) to predict the arm and hand dynamics simultaneously. We also introduce new losses to encourage the estimations to be smooth and accurate. Besides, we collect a motion capture dataset including 200K frames of hand gestures and use this data to train our model. By integrating a 2D hand pose estimation model and a 3D human pose estimation model, the proposed method can produce plausible arm and hand dynamics from monocular video. Extensive evaluations demonstrate that the proposed method has advantages over previous state-of-the-art approaches and shows robustness under various challenging scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes