CVMar 20

Monocular Models are Strong Learners for Multi-View Human Mesh Recovery

arXiv:2603.2039146.7h-index: 4
AI Analysis

This addresses the need for calibration-free and generalizable human mesh recovery in real-world scenarios, representing a novel approach rather than an incremental improvement.

The paper tackled the problem of multi-view human mesh recovery without requiring camera calibration or multi-view training data by leveraging pretrained single-view models as priors and refining them with test-time optimization, achieving state-of-the-art performance on standard benchmarks.

Multi-view human mesh recovery (HMR) is broadly deployed in diverse domains where high accuracy and strong generalization are essential. Existing approaches can be broadly grouped into geometry-based and learning-based methods. However, geometry-based methods (e.g., triangulation) rely on cumbersome camera calibration, while learning-based approaches often generalize poorly to unseen camera configurations due to the lack of multi-view training data, limiting their performance in real-world scenarios. To enable calibration-free reconstruction that generalizes to arbitrary camera setups, we propose a training-free framework that leverages pretrained single-view HMR models as strong priors, eliminating the need for multi-view training data. Our method first constructs a robust and consistent multi-view initialization from single-view predictions, and then refines it via test-time optimization guided by multi-view consistency and anatomical constraints. Extensive experiments demonstrate state-of-the-art performance on standard benchmarks, surpassing multi-view models trained with explicit multi-view supervision.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes