CVDec 14, 2023

Towards Robust and Expressive Whole-body Human Pose and Shape Estimation

arXiv:2312.08730v116 citationsh-index: 29Has CodeNIPS
Originality Incremental advance
AI Analysis

This work addresses the challenge of reliable whole-body pose estimation for applications in computer vision and human-computer interaction, representing an incremental improvement over existing methods.

The paper tackles the problem of whole-body human pose and shape estimation from monocular images, which suffers from degraded performance in complex real-world scenarios due to issues with bounding box predictions. The proposed framework, incorporating localization, contrastive feature extraction, and pixel alignment modules, achieves improved robustness and accuracy across body, hands, face, and whole-body benchmarks.

Whole-body pose and shape estimation aims to jointly predict different behaviors (e.g., pose, hand gesture, facial expression) of the entire human body from a monocular image. Existing methods often exhibit degraded performance under the complexity of in-the-wild scenarios. We argue that the accuracy and reliability of these models are significantly affected by the quality of the predicted \textit{bounding box}, e.g., the scale and alignment of body parts. The natural discrepancy between the ideal bounding box annotations and model detection results is particularly detrimental to the performance of whole-body pose and shape estimation. In this paper, we propose a novel framework to enhance the robustness of whole-body pose and shape estimation. Our framework incorporates three new modules to address the above challenges from three perspectives: \textbf{1) Localization Module} enhances the model's awareness of the subject's location and semantics within the image space. \textbf{2) Contrastive Feature Extraction Module} encourages the model to be invariant to robust augmentations by incorporating contrastive loss with dedicated positive samples. \textbf{3) Pixel Alignment Module} ensures the reprojected mesh from the predicted camera and body model parameters are accurate and pixel-aligned. We perform comprehensive experiments to demonstrate the effectiveness of our proposed framework on body, hands, face and whole-body benchmarks. Codebase is available at \url{https://github.com/robosmplx/robosmplx}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes