CVAIMar 31, 2025

MuseFace: Text-driven Face Editing via Diffusion-based Mask Generation Approach

arXiv:2503.23888v13 citationsh-index: 5ICME
Originality Incremental advance
AI Analysis

This addresses the problem of enhancing diversity, controllability, and flexibility in face editing for personal image customization, representing an incremental improvement over existing methods.

The paper tackles the challenge of text-driven face editing by proposing MuseFace, a framework that integrates a Text-to-Mask diffusion model and a semantic-aware face editing model to generate fine-grained semantic masks from text, achieving superior high-fidelity performance.

Face editing modifies the appearance of face, which plays a key role in customization and enhancement of personal images. Although much work have achieved remarkable success in text-driven face editing, they still face significant challenges as none of them simultaneously fulfill the characteristics of diversity, controllability and flexibility. To address this challenge, we propose MuseFace, a text-driven face editing framework, which relies solely on text prompt to enable face editing. Specifically, MuseFace integrates a Text-to-Mask diffusion model and a semantic-aware face editing model, capable of directly generating fine-grained semantic masks from text and performing face editing. The Text-to-Mask diffusion model provides \textit{diversity} and \textit{flexibility} to the framework, while the semantic-aware face editing model ensures \textit{controllability} of the framework. Our framework can create fine-grained semantic masks, making precise face editing possible, and significantly enhancing the controllability and flexibility of face editing models. Extensive experiments demonstrate that MuseFace achieves superior high-fidelity performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes