CVAIJan 5, 2025

Face-MakeUp: Multimodal Facial Prompts for Text-to-Image Generation

arXiv:2501.02523v15 citationsh-index: 2Has CodeECAI
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of facial image generation for applications requiring identity preservation, though it is incremental as it builds on existing diffusion models with domain-specific optimizations.

The paper tackles the challenge of generating desired facial images using text prompts alone by introducing a multimodal approach that integrates image prompts, resulting in the best comprehensive performance on two face-related test datasets.

Facial images have extensive practical applications. Although the current large-scale text-image diffusion models exhibit strong generation capabilities, it is challenging to generate the desired facial images using only text prompt. Image prompts are a logical choice. However, current methods of this type generally focus on general domain. In this paper, we aim to optimize image makeup techniques to generate the desired facial images. Specifically, (1) we built a dataset of 4 million high-quality face image-text pairs (FaceCaptionHQ-4M) based on LAION-Face to train our Face-MakeUp model; (2) to maintain consistency with the reference facial image, we extract/learn multi-scale content features and pose features for the facial image, integrating these into the diffusion model to enhance the preservation of facial identity features for diffusion models. Validation on two face-related test datasets demonstrates that our Face-MakeUp can achieve the best comprehensive performance.All codes are available at:https://github.com/ddw2AIGROUP2CQUPT/Face-MakeUp

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes