CVNov 26, 2020

Multi-view Human Pose and Shape Estimation Using Learnable Volumetric Aggregation

arXiv:2011.13427v119 citations
Originality Incremental advance
AI Analysis

This work aims to improve the accuracy and real-time capability of 3D human pose and shape estimation for healthcare applications, where high accuracy is critical for clinical translation.

This paper addresses the challenge of 3D human pose and shape estimation from multi-view RGB images, aiming to overcome limitations of monocular methods and marker-based motion capture. The proposed learnable volumetric aggregation approach reconstructs 3D human body pose and shape, demonstrating higher accuracy and greater promise for real-time prediction compared to previous methods.

Human pose and shape estimation from RGB images is a highly sought after alternative to marker-based motion capture, which is laborious, requires expensive equipment, and constrains capture to laboratory environments. Monocular vision-based algorithms, however, still suffer from rotational ambiguities and are not ready for translation in healthcare applications, where high accuracy is paramount. While fusion of data from multiple viewpoints could overcome these challenges, current algorithms require further improvement to obtain clinically acceptable accuracies. In this paper, we propose a learnable volumetric aggregation approach to reconstruct 3D human body pose and shape from calibrated multi-view images. We use a parametric representation of the human body, which makes our approach directly applicable to medical applications. Compared to previous approaches, our framework shows higher accuracy and greater promise for real-time prediction, given its cost efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes