CVJun 6, 2023

Learning Human Mesh Recovery in 3D Scenes

arXiv:2306.03847v122 citationsh-index: 71
Originality Incremental advance
AI Analysis

This addresses the challenge of human mesh recovery in 3D scenes for applications like robotics and AR/VR, offering an optimization-free, real-time solution with incremental improvements over existing methods.

The paper tackles the problem of recovering absolute human pose and shape from a single image in a pre-scanned 3D scene, proposing a method that uses a sparse 3D CNN and cross-attention to reduce ambiguity, resulting in more accurate and physically-plausible meshes with improved speed.

We present a novel method for recovering the absolute pose and shape of a human in a pre-scanned scene given a single image. Unlike previous methods that perform sceneaware mesh optimization, we propose to first estimate absolute position and dense scene contacts with a sparse 3D CNN, and later enhance a pretrained human mesh recovery network by cross-attention with the derived 3D scene cues. Joint learning on images and scene geometry enables our method to reduce the ambiguity caused by depth and occlusion, resulting in more reasonable global postures and contacts. Encoding scene-aware cues in the network also allows the proposed method to be optimization-free, and opens up the opportunity for real-time applications. The experiments show that the proposed network is capable of recovering accurate and physically-plausible meshes by a single forward pass and outperforms state-of-the-art methods in terms of both accuracy and speed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes