16.2CVJun 2
AvatarMix: Identity-Preserving Cross-Avatar Composition for Outfit PersonalizationZhaorong Wang, Yoshihiro Kanamori, Yuki Endo
Existing 3D avatar outfit transfer methods face distinct challenges: approaches that lift 2D edits to 3D often suffer from outfit or identity quality degradation, while those that separately model body and clothing layers are prone to intersection artifacts. We introduce AvatarMix, a compositional paradigm that bypasses these issues by directly composing the head and body from two high-fidelity Gaussian avatars. While this paradigm inherently preserves outfit quality and avoids intersections, it introduces challenges in creating a seamless join and maintaining appearance fidelity after body reshaping. To this end, we propose a two-tier refinement strategy: SeamFix, a localized diffusion module that refines hair and neck to ensure an artifact-free join, and an optional full-body refinement, FullbodyFix, that restores garment appearance when retargeting degrades the clothed body. Both operate on renders from an already 3D-consistent Gaussian avatar, which limits multi-view artifacts compared to 2D-to-3D lifting. To preserve the user's body identity, our mesh-based Gaussian representation enables the adaptation of a robust mesh retargeting technique, precisely reshaping the clothed body to the user's physique and robustly handling diverse body shapes. Extensive experiments demonstrate that our method achieves state-of-the-art results in outfit fidelity and identity preservation, providing a new perspective for realistic 3D outfit personalization. Project page: https://larsph.github.io/avatarmix/
CVOct 16, 2024
EG-HumanNeRF: Efficient Generalizable Human NeRF Utilizing Human Prior for Sparse ViewZhaorong Wang, Yoshihiro Kanamori, Yuki Endo
Generalizable neural radiance field (NeRF) enables neural-based digital human rendering without per-scene retraining. When combined with human prior knowledge, high-quality human rendering can be achieved even with sparse input views. However, the inference of these methods is still slow, as a large number of neural network queries on each ray are required to ensure the rendering quality. Moreover, occluded regions often suffer from artifacts, especially when the input views are sparse. To address these issues, we propose a generalizable human NeRF framework that achieves high-quality and real-time rendering with sparse input views by extensively leveraging human prior knowledge. We accelerate the rendering with a two-stage sampling reduction strategy: first constructing boundary meshes around the human geometry to reduce the number of ray samples for sampling guidance regression, and then volume rendering using fewer guided samples. To improve rendering quality, especially in occluded regions, we propose an occlusion-aware attention mechanism to extract occlusion information from the human priors, followed by an image space refinement network to improve rendering quality. Furthermore, for volume rendering, we adopt a signed ray distance function (SRDF) formulation, which allows us to propose an SRDF loss at every sample position to improve the rendering quality further. Our experiments demonstrate that our method outperforms the state-of-the-art methods in rendering quality and has a competitive rendering speed compared with speed-prioritized novel view synthesis methods.