CVMar 26

HGGT: Robust and Flexible 3D Hand Mesh Reconstruction from Uncalibrated Images

arXiv:2603.2399774.4h-index: 25
AI Analysis

This addresses the need for accurate and flexible hand reconstruction for applications like robotics and VR/AR, offering a novel solution to bridge single-view and multi-view limitations.

The paper tackles the problem of 3D hand mesh reconstruction from uncalibrated images by proposing a method that jointly infers meshes and camera poses, outperforming state-of-the-art benchmarks and generalizing well to in-the-wild scenarios.

Recovering high-fidelity 3D hand geometry from images is a critical task in computer vision, holding significant value for domains such as robotics, animation and VR/AR. Crucially, scalable applications demand both accuracy and deployment flexibility, requiring the ability to leverage massive amounts of unstructured image data from the internet or enable deployment on consumer-grade RGB cameras without complex calibration. However, current methods face a dilemma. While single-view approaches are easy to deploy, they suffer from depth ambiguity and occlusion. Conversely, multi-view systems resolve these uncertainties but typically demand fixed, calibrated setups, limiting their real-world utility. To bridge this gap, we draw inspiration from 3D foundation models that learn explicit geometry directly from visual data. By reformulating hand reconstruction from arbitrary views as a visual-geometry grounded task, we propose a feed-forward architecture that, for the first time in literature, jointly infers 3D hand meshes and camera poses from uncalibrated views. Extensive evaluations show that our approach outperforms state-of-the-art benchmarks and demonstrates strong generalization to uncalibrated, in-the-wild scenarios. Here is the link of our project page: https://lym29.github.io/HGGT/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes