CVAug 8, 2023

Exploiting Spatial-Temporal Context for Interacting Hand Reconstruction on Monocular RGB Video

arXiv:2308.04074v35 citationsh-index: 68
Originality Incremental advance
AI Analysis

This work addresses the problem of accurate 3D hand reconstruction for applications in human-computer interaction and virtual reality, representing an incremental improvement over previous methods.

The paper tackles the problem of reconstructing interacting hands from monocular RGB video, which is challenging due to occlusions and similar textures, by exploiting spatial-temporal context to achieve new state-of-the-art performance on public benchmarks.

Reconstructing interacting hands from monocular RGB data is a challenging task, as it involves many interfering factors, e.g. self- and mutual occlusion and similar textures. Previous works only leverage information from a single RGB image without modeling their physically plausible relation, which leads to inferior reconstruction results. In this work, we are dedicated to explicitly exploiting spatial-temporal information to achieve better interacting hand reconstruction. On one hand, we leverage temporal context to complement insufficient information provided by the single frame, and design a novel temporal framework with a temporal constraint for interacting hand motion smoothness. On the other hand, we further propose an interpenetration detection module to produce kinetically plausible interacting hands without physical collisions. Extensive experiments are performed to validate the effectiveness of our proposed framework, which achieves new state-of-the-art performance on public benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes