CV AIMar 31, 2025

MuseFace: Text-driven Face Editing via Diffusion-based Mask Generation Approach

Xin Zhang, Siting Huang, Xiangyang Luo, Yifan Xie, Weijiang Yu, Heng Chang, Fei Ma, Fei Yu

arXiv:2503.23888v18.43 citationsh-index: 5ICME

Originality Incremental advance

AI Analysis

This addresses the problem of enhancing diversity, controllability, and flexibility in face editing for personal image customization, representing an incremental improvement over existing methods.

The paper tackles the challenge of text-driven face editing by proposing MuseFace, a framework that integrates a Text-to-Mask diffusion model and a semantic-aware face editing model to generate fine-grained semantic masks from text, achieving superior high-fidelity performance.

Face editing modifies the appearance of face, which plays a key role in customization and enhancement of personal images. Although much work have achieved remarkable success in text-driven face editing, they still face significant challenges as none of them simultaneously fulfill the characteristics of diversity, controllability and flexibility. To address this challenge, we propose MuseFace, a text-driven face editing framework, which relies solely on text prompt to enable face editing. Specifically, MuseFace integrates a Text-to-Mask diffusion model and a semantic-aware face editing model, capable of directly generating fine-grained semantic masks from text and performing face editing. The Text-to-Mask diffusion model provides \textit{diversity} and \textit{flexibility} to the framework, while the semantic-aware face editing model ensures \textit{controllability} of the framework. Our framework can create fine-grained semantic masks, making precise face editing possible, and significantly enhancing the controllability and flexibility of face editing models. Extensive experiments demonstrate that MuseFace achieves superior high-fidelity performance.

View on arXiv PDF

Similar