CVFeb 26, 2024

Multi-Human Mesh Recovery with Transformers

arXiv:2402.16806v11 citationsh-index: 19
Originality Highly original
AI Analysis

This addresses the challenge of accurate 3D human pose and shape estimation in images with multiple people, which is incremental but important for applications like surveillance and AR.

The paper tackles the problem of inaccurate relative positioning in multi-human mesh recovery by proposing a whole-image-based transformer model that processes all individuals simultaneously, achieving significant performance improvements over state-of-the-art methods on various benchmarks.

Conventional approaches to human mesh recovery predominantly employ a region-based strategy. This involves initially cropping out a human-centered region as a preprocessing step, with subsequent modeling focused on this zoomed-in image. While effective for single figures, this pipeline poses challenges when dealing with images featuring multiple individuals, as different people are processed separately, often leading to inaccuracies in relative positioning. Despite the advantages of adopting a whole-image-based approach to address this limitation, early efforts in this direction have fallen short in performance compared to recent region-based methods. In this work, we advocate for this under-explored area of modeling all people at once, emphasizing its potential for improved accuracy in multi-person scenarios through considering all individuals simultaneously and leveraging the overall context and interactions. We introduce a new model with a streamlined transformer-based design, featuring three critical design choices: multi-scale feature incorporation, focused attention mechanisms, and relative joint supervision. Our proposed model demonstrates a significant performance improvement, surpassing state-of-the-art region-based and whole-image-based methods on various benchmarks involving multiple individuals.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes