CVMay 6

Anny-Fit: All-Age Human Mesh Recovery

arXiv:2605.0472893.3h-index: 30Has Code
AI Analysis

For researchers in human-centric vision, this work enables zero-shot adaptation of adult-trained HMR models to all-age scenes without retraining, addressing a key limitation of existing methods.

Anny-Fit jointly optimizes all individuals in a scene for all-age 3D human mesh recovery, using multiple expert knowledge sources (depth maps, segmentation, keypoints, VLM attributes) to resolve depth-scale ambiguity. It improves 2D reprojection accuracy by 13-16, relative depth ordering by 6-7, 3D estimation error by -9 to -29, and shape estimation by +25 to +82 across diverse datasets.

Recovering 3D human pose and shape from a single image remains a cornerstone of human-centric vision, yet most methods assume adult subjects and optimize each person independently. These assumptions fail in real-world, all-age scenes, where body proportions and depth must be resolved jointly. We introduce Anny-Fit, a multi-person, camera-space optimization framework for all-age 3D human mesh recovery (HMR). Unlike existing per-person fitting methods, Anny-Fit jointly optimizes all individuals directly in the camera coordinate system, enforcing global spatial consistency. At the core of our approach is the use of multiple forms of expert knowledge -- including metric depth maps, instance segmentation, 2D keypoints, and, VLM-derived semantic attributes such as age and gender -- each obtained from dedicated off-the-shelf networks. These complementary signals jointly guide the optimization, constraining the depth-scale ambiguity characteristic of all-age scenes. Across diverse datasets, Anny-Fit consistently improves 2D reprojection accuracy (+13 to 16), relative depth ordering (+6 to 7), 3D estimation error (-9 to -29) and shape estimation (+25 to +82), producing more coherent scenes. Finally, we show that VLM-based semantic knowledge can be distilled into an HMR model via the pseudo-ground-truth annotations produced by Anny-Fit on training data, enabling it to learn semantically meaningful shape parameters while improving HMR performance. Our approach bridges adult-only and all-age modeling by enabling zero-shot adaptation of adult-trained HMR pipelines to the full age spectrum without retraining. Code is publicly available at https://github.com/naver/anny-fit.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes