MAGE:A Multi-stage Avatar Generator with Sparse Observations
This addresses the challenge of realistic full-body motion prediction from limited sensor data for AR/VR applications, representing a novel methodological improvement rather than a foundational breakthrough.
The paper tackles the problem of inferring full-body poses from sparse 3-joint observations in AR/VR applications, proposing MAGE, a multi-stage avatar generator that progressively refines predictions from 6 to 22 joints, achieving significantly better accuracy and continuity than state-of-the-art methods.
Inferring full-body poses from Head Mounted Devices, which capture only 3-joint observations from the head and wrists, is a challenging task with wide AR/VR applications. Previous attempts focus on learning one-stage motion mapping and thus suffer from an over-large inference space for unobserved body joint motions. This often leads to unsatisfactory lower-body predictions and poor temporal consistency, resulting in unrealistic or incoherent motion sequences. To address this, we propose a powerful Multi-stage Avatar GEnerator named MAGE that factorizes this one-stage direct motion mapping learning with a progressive prediction strategy. Specifically, given initial 3-joint motions, MAGE gradually inferring multi-scale body part poses at different abstract granularity levels, starting from a 6-part body representation and gradually refining to 22 joints. With decreasing abstract levels step by step, MAGE introduces more motion context priors from former prediction stages and thus improves realistic motion completion with richer constraint conditions and less ambiguity. Extensive experiments on large-scale datasets verify that MAGE significantly outperforms state-of-the-art methods with better accuracy and continuity.