JADE: Joint-aware Latent Diffusion for 3D Human Generative Modeling
This work addresses the challenge of creating expressive and interpretable 3D human models for computer vision applications, representing an incremental improvement over prior approaches.
The paper tackled the problem of generative modeling of 3D human bodies by introducing JADE, a framework that uses a joint-aware latent representation to achieve fine-grained control, resulting in improved reconstruction accuracy, editing controllability, and generation quality compared to existing methods.
Generative modeling of 3D human bodies have been studied extensively in computer vision. The core is to design a compact latent representation that is both expressive and semantically interpretable, yet existing approaches struggle to achieve both requirements. In this work, we introduce JADE, a generative framework that learns the variations of human shapes with fined-grained control. Our key insight is a joint-aware latent representation that decomposes human bodies into skeleton structures, modeled by joint positions, and local surface geometries, characterized by features attached to each joint. This disentangled latent space design enables geometric and semantic interpretation, facilitating users with flexible controllability. To generate coherent and plausible human shapes under our proposed decomposition, we also present a cascaded pipeline where two diffusions are employed to model the distribution of skeleton structures and local surface geometries respectively. Extensive experiments are conducted on public datasets, where we demonstrate the effectiveness of JADE framework in multiple tasks in terms of autoencoding reconstruction accuracy, editing controllability and generation quality compared with existing methods.