NoPo-Avatar: Generalizable and Animatable Avatars from Sparse Inputs without Human Poses
This addresses the challenge of noisy pose estimates in avatar reconstruction for applications like virtual reality or animation, though it is incremental as it builds on prior methods by removing pose dependence.
The paper tackles the problem of reconstructing animatable 3D human avatars from sparse images without relying on human pose inputs, showing that it outperforms baselines in practical settings and matches them in lab settings.
We tackle the task of recovering an animatable 3D human avatar from a single or a sparse set of images. For this task, beyond a set of images, many prior state-of-the-art methods use accurate "ground-truth" camera poses and human poses as input to guide reconstruction at test-time. We show that pose-dependent reconstruction degrades results significantly if pose estimates are noisy. To overcome this, we introduce NoPo-Avatar, which reconstructs avatars solely from images, without any pose input. By removing the dependence of test-time reconstruction on human poses, NoPo-Avatar is not affected by noisy human pose estimates, making it more widely applicable. Experiments on challenging THuman2.0, XHuman, and HuGe100K data show that NoPo-Avatar outperforms existing baselines in practical settings (without ground-truth poses) and delivers comparable results in lab settings (with ground-truth poses).