63.4CVMar 30
DipGuava: Disentangling Personalized Gaussian Features for 3D Head Avatars from Monocular VideoJeonghaeng Lee, Seok Keun Choi, Zhixuan Li et al.
While recent 3D head avatar creation methods attempt to animate facial dynamics, they often fail to capture personalized details, limiting realism and expressiveness. To fill this gap, we present DipGuava (Disentangled and Personalized Gaussian UV Avatar), a novel 3D Gaussian head avatar creation method that successfully generates avatars with personalized attributes from monocular video. DipGuava is the first method to explicitly disentangle facial appearance into two complementary components, trained in a structured two-stage pipeline that significantly reduces learning ambiguity and enhances reconstruction fidelity. In the first stage, we learn a stable geometry-driven base appearance that captures global facial structure and coarse expression-dependent variations. In the second stage, the personalized residual details not captured in the first stage are predicted, including high-frequency components and nonlinearly varying features such as wrinkles and subtle skin deformations. These components are fused via dynamic appearance fusion that integrates residual details after deformation, ensuring spatial and semantic alignment. This disentangled design enables DipGuava to generate photorealistic, identity-preserving avatars, consistently outperforming prior methods in both visual quality and quantitativeperformance, as demonstrated in extensive experiments.
CVAug 3, 2025
Shape Distribution Matters: Shape-specific Mixture-of-Experts for Amodal Segmentation under Diverse OcclusionsZhixuan Li, Yujia Liu, Chen Hui et al.
Amodal segmentation targets to predict complete object masks, covering both visible and occluded regions. This task poses significant challenges due to complex occlusions and extreme shape variation, from rigid furniture to highly deformable clothing. Existing one-size-fits-all approaches rely on a single model to handle all shape types, struggling to capture and reason about diverse amodal shapes due to limited representation capacity. A natural solution is to adopt a Mixture-of-Experts (MoE) framework, assigning experts to different shape patterns. However, naively applying MoE without considering the object's underlying shape distribution can lead to mismatched expert routing and insufficient expert specialization, resulting in redundant or underutilized experts. To deal with these issues, we introduce ShapeMoE, a shape-specific sparse Mixture-of-Experts framework for amodal segmentation. The key idea is to learn a latent shape distribution space and dynamically route each object to a lightweight expert tailored to its shape characteristics. Specifically, ShapeMoE encodes each object into a compact Gaussian embedding that captures key shape characteristics. A Shape-Aware Sparse Router then maps the object to the most suitable expert, enabling precise and efficient shape-aware expert routing. Each expert is designed as lightweight and specialized in predicting occluded regions for specific shape patterns. ShapeMoE offers well interpretability via clear shape-to-expert correspondence, while maintaining high capacity and efficiency. Experiments on COCOA-cls, KINS, and D2SA show that ShapeMoE consistently outperforms state-of-the-art methods, especially in occluded region segmentation. The code will be released.