CVMar 12, 2025

Better Together: Unified Motion Capture and 3D Avatar Reconstruction

Arthur Moreau, Mohammed Brahimi, Richard Shaw, Athanasios Papaioannou, Thomas Tanay, Zhensong Zhang, Eduardo Pérez-Pellitero

arXiv:2503.09293v18.41 citations

Originality Highly original

AI Analysis

This addresses the need for more accurate and visually realistic human motion capture and avatar rendering in applications like virtual reality or animation, representing a novel integration rather than an incremental improvement.

The paper tackles the joint problem of human pose estimation and 3D avatar reconstruction from multi-view videos, achieving a 35% reduction in body joint error and a 45% reduction in hand joint error compared to keypoint-based methods, while improving avatar visual quality by +2dB PSNR.

We present Better Together, a method that simultaneously solves the human pose estimation problem while reconstructing a photorealistic 3D human avatar from multi-view videos. While prior art usually solves these problems separately, we argue that joint optimization of skeletal motion with a 3D renderable body model brings synergistic effects, i.e. yields more precise motion capture and improved visual quality of real-time rendering of avatars. To achieve this, we introduce a novel animatable avatar with 3D Gaussians rigged on a personalized mesh and propose to optimize the motion sequence with time-dependent MLPs that provide accurate and temporally consistent pose estimates. We first evaluate our method on highly challenging yoga poses and demonstrate state-of-the-art accuracy on multi-view human pose estimation, reducing error by 35% on body joints and 45% on hand joints compared to keypoint-based methods. At the same time, our method significantly boosts the visual quality of animatable avatars (+2dB PSNR on novel view synthesis) on diverse challenging subjects.

View on arXiv PDF

Similar