CVNov 26, 2020

4D Human Body Capture from Egocentric Video via 3D Scene Grounding

arXiv:2011.13341v245 citations
AI Analysis

This work addresses the problem of accurate 3D human body capture from egocentric videos, which is an incremental improvement for computer vision researchers working on human motion capture.

This paper tackles the problem of reconstructing 3D human body meshes from egocentric videos, which are challenging due to the unique viewpoint and rapid camera motion. The proposed optimization-based approach uses 2D observations and human-scene interaction constraints, resulting in more accurate human-body poses and shapes compared to the state-of-the-art method in egocentric settings, and more realistic human-scene interaction.

We introduce a novel task of reconstructing a time series of second-person 3D human body meshes from monocular egocentric videos. The unique viewpoint and rapid embodied camera motion of egocentric videos raise additional technical barriers for human body capture. To address those challenges, we propose a simple yet effective optimization-based approach that leverages 2D observations of the entire video sequence and human-scene interaction constraint to estimate second-person human poses, shapes, and global motion that are grounded on the 3D environment captured from the egocentric view. We conduct detailed ablation studies to validate our design choice. Moreover, we compare our method with the previous state-of-the-art method on human motion capture from monocular video, and show that our method estimates more accurate human-body poses and shapes under the challenging egocentric setting. In addition, we demonstrate that our approach produces more realistic human-scene interaction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes