Towards High-Fidelity, Identity-Preserving Real-Time Makeup Transfer: Decoupling Style Generation
This addresses practical adoption challenges in live makeup applications for users by enabling real-time, identity-preserving cosmetic transfer, though it is incremental as it builds on existing methods with novel training objectives.
The paper tackles the problem of real-time virtual makeup try-on by decoupling makeup transfer into extraction and rendering steps, achieving high-fidelity, identity-preserving results with robust temporal consistency, outperforming existing baselines in detail capture and stability.
We present a novel framework for real-time virtual makeup try-on that achieves high-fidelity, identity-preserving cosmetic transfer with robust temporal consistency. In live makeup transfer applications, it is critical to synthesize temporally coherent results that accurately replicate fine-grained makeup and preserve user's identity. However, existing methods often struggle to disentangle semitransparent cosmetics from skin tones and other identify features, causing identity shifts and raising fairness concerns. Furthermore, current methods lack real-time capabilities and fail to maintain temporal consistency, limiting practical adoption. To address these challenges, we decouple makeup transfer into two steps: transparent makeup mask extraction and graphics-based mask rendering. After the makeup extraction step, the makeup rendering can be performed in real time, enabling live makeup try-on. Our makeup extraction model trained on pseudo-ground-truth data generated via two complementary methods: a graphics-based rendering pipeline and an unsupervised k-means clustering approach. To further enhance transparency estimation and color fidelity, we propose specialized training objectives, including alpha-weighted reconstruction and lip color losses. Our method achieves robust makeup transfer across diverse poses, expressions, and skin tones while preserving temporal smoothness. Extensive experiments demonstrate that our approach outperforms existing baselines in capturing fine details, maintaining temporal stability, and preserving identity integrity.