CVAug 16, 2020

Neural Descent for Visual 3D Human Pose and Shape

arXiv:2008.06910v275 citations
Originality Incremental advance
AI Analysis

This addresses the problem of efficient and accurate 3D human sensing for computer vision applications, with incremental improvements in methodology.

The paper tackles 3D human pose and shape reconstruction from RGB images by introducing HUmanNeural Descent (HUND), a learning-to-optimize approach that avoids second-order differentiation and expensive gradient descent, achieving competitive results on datasets like H3.6M and 3DPW.

We present deep neural network methodology to reconstruct the 3d pose and shape of people, given an input RGB image. We rely on a recently introduced, expressivefull body statistical 3d human model, GHUM, trained end-to-end, and learn to reconstruct its pose and shape state in a self-supervised regime. Central to our methodology, is a learning to learn and optimize approach, referred to as HUmanNeural Descent (HUND), which avoids both second-order differentiation when training the model parameters,and expensive state gradient descent in order to accurately minimize a semantic differentiable rendering loss at test time. Instead, we rely on novel recurrent stages to update the pose and shape parameters such that not only losses are minimized effectively, but the process is meta-regularized in order to ensure end-progress. HUND's symmetry between training and testing makes it the first 3d human sensing architecture to natively support different operating regimes including self-supervised ones. In diverse tests, we show that HUND achieves very competitive results in datasets like H3.6M and 3DPW, aswell as good quality 3d reconstructions for complex imagery collected in-the-wild.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes