CVMar 12, 2025

Better Together: Unified Motion Capture and 3D Avatar Reconstruction

arXiv:2503.09293v11 citations
Originality Highly original
AI Analysis

This addresses the need for more accurate and visually realistic human motion capture and avatar rendering in applications like virtual reality or animation, representing a novel integration rather than an incremental improvement.

The paper tackles the joint problem of human pose estimation and 3D avatar reconstruction from multi-view videos, achieving a 35% reduction in body joint error and a 45% reduction in hand joint error compared to keypoint-based methods, while improving avatar visual quality by +2dB PSNR.

We present Better Together, a method that simultaneously solves the human pose estimation problem while reconstructing a photorealistic 3D human avatar from multi-view videos. While prior art usually solves these problems separately, we argue that joint optimization of skeletal motion with a 3D renderable body model brings synergistic effects, i.e. yields more precise motion capture and improved visual quality of real-time rendering of avatars. To achieve this, we introduce a novel animatable avatar with 3D Gaussians rigged on a personalized mesh and propose to optimize the motion sequence with time-dependent MLPs that provide accurate and temporally consistent pose estimates. We first evaluate our method on highly challenging yoga poses and demonstrate state-of-the-art accuracy on multi-view human pose estimation, reducing error by 35% on body joints and 45% on hand joints compared to keypoint-based methods. At the same time, our method significantly boosts the visual quality of animatable avatars (+2dB PSNR on novel view synthesis) on diverse challenging subjects.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes