CVAIFeb 3

Hand3R: Online 4D Hand-Scene Reconstruction in the Wild

arXiv:2602.03200v1h-index: 10
Originality Incremental advance
AI Analysis

This addresses the need for understanding physical interactions in Embodied AI by enabling simultaneous hand and scene reconstruction, though it appears incremental as it builds on pre-trained models.

The paper tackles the problem of jointly reconstructing dynamic hands and dense 3D scenes from monocular video, presenting Hand3R as the first online framework for 4D hand-scene reconstruction, which achieves competitive performance in hand reconstruction and global positioning without offline optimization.

For Embodied AI, jointly reconstructing dynamic hands and the dense scene context is crucial for understanding physical interaction. However, most existing methods recover isolated hands in local coordinates, overlooking the surrounding 3D environment. To address this, we present Hand3R, the first online framework for joint 4D hand-scene reconstruction from monocular video. Hand3R synergizes a pre-trained hand expert with a 4D scene foundation model via a scene-aware visual prompting mechanism. By injecting high-fidelity hand priors into a persistent scene memory, our approach enables simultaneous reconstruction of accurate hand meshes and dense metric-scale scene geometry in a single forward pass. Experiments demonstrate that Hand3R bypasses the reliance on offline optimization and delivers competitive performance in both local hand reconstruction and global positioning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes