MetricHMR: Metric Human Mesh Recovery from Monocular Images
This addresses the challenge of accurate 3D human reconstruction from single images for applications in computer vision and robotics, representing a novel method for a known bottleneck rather than an incremental improvement.
The paper tackles the problem of metric human mesh recovery from monocular images, which suffers from scale and depth ambiguity, by introducing MetricHMR, a method that produces geometrically reasonable body shape and global translation, achieving state-of-the-art performance in metric pose, shape, and translation estimation across indoor and in-the-wild scenarios.
We introduce MetricHMR (Metric Human Mesh Recovery), an approach for metric human mesh recovery with accurate global translation from monocular images. In contrast to existing HMR methods that suffer from severe scale and depth ambiguity, MetricHMR is able to produce geometrically reasonable body shape and global translation in the reconstruction results. To this end, we first systematically analyze previous HMR methods on camera models to emphasize the critical role of the standard perspective projection model in enabling metric-scale HMR. We then validate the acceptable ambiguity range of metric HMR under the standard perspective projection model. Finally, we contribute a novel approach that introduces a ray map based on the standard perspective projection to jointly encode bounding-box information, camera parameters, and geometric cues for End2End metric HMR without any additional metric-regularization modules. Extensive experiments demonstrate that our method achieves state-of-the-art performance, even compared with sequential HMR methods, in metric pose, shape, and global translation estimation across both indoor and in-the-wild scenarios.