MD-Face: MoE-Enhanced Label-Free Disentangled Representation for Interactive Facial Attribute Editing
This addresses the issue of unintentional attribute changes during facial editing for virtual avatars and social media applications, offering an incremental improvement over existing methods.
The paper tackled the problem of attribute entanglement in GAN-based facial attribute editing by proposing MD-Face, a label-free disentangled representation learning framework using Mixture of Experts, which outperformed unsupervised baselines and competed with supervised ones in experiments.
GAN-based facial attribute editing is widely used in virtual avatars and social media but often suffers from attribute entanglement, where modifying one face attribute unintentionally alters others. While supervised disentangled representation learning can address this, it relies heavily on labeled data, incurring high annotation costs. To address these challenges, we propose MD-Face, a label-free disentangled representation learning framework based on Mixture of Experts (MoE). MD-Face utilizes a MoE backbone with a gating mechanism that dynamically allocates experts, enabling the model to learn semantic vectors with greater independence. To further enhance attribute entanglement, we introduce a geometry-aware loss, which aligns each semantic vector with its corresponding Semantic Boundary Vector (SBV) through a Jacobian-based pushforward method. Experiments with ProGAN and StyleGAN show that MD-Face outperforms unsupervised baselines and competes with supervised ones. Compared to diffusion-based methods, it offers better image quality and lower inference latency, making it ideal for interactive editing.