CVAIMay 14

FactorizedHMR: A Hybrid Framework for Video Human Mesh Recovery

arXiv:2605.1485431.0
AI Analysis

For researchers in 3D human pose and mesh estimation, this work offers a principled way to handle non-uniform ambiguity, though improvements over strong baselines are incremental.

FactorizedHMR addresses the ambiguity in human mesh recovery by using a two-stage framework: a deterministic module for stable torso-root prediction and a probabilistic flow-matching module for uncertain articulations. It achieves competitive performance, with notable gains in occlusion-heavy scenarios and world-space metrics.

Human Mesh Recovery (HMR) is fundamentally ambiguous: under occlusion or weak depth cues, multiple 3D bodies can explain the same image evidence. This ambiguity is not uniform across the body, as torso pose and root structure are often relatively well constrained, whereas distal articulations such as the arms and legs are more uncertain. Building on this observation, we propose FactorizedHMR, a two-stage framework that treats these two regimes differently. A deterministic regression module first recovers a stable torso-root anchor, and a probabilistic flow-matching module then completes the remaining non-torso articulation. To make this completion reliable, we combine a composite target representation with geometry-aware supervision and feature-aware classifier-free guidance, preserving the torso-root anchor while improving single-reference recovery of ambiguity-prone articulation. We also introduce a synthetic data pipeline that provides the paired image-camera-motion supervision under diverse viewpoints. Across camera-space and world-space benchmarks, FactorizedHMR remains competitive with strong baselines, with the clearest gains in occlusion-heavy recovery and drift-sensitive world-space metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes