MUSE: Textual Attributes Guided Portrait Painting Generation
This addresses the challenge of creative and expressive portrait generation for artists or designers, but it is incremental as it builds on existing image-to-image models.
The paper tackles the problem of generating portraits from textual attributes and facial features, proposing MUSE, which significantly outperforms state-of-the-art methods by increasing Inception Score by 6% and decreasing FID by 11%, and accurately illustrates 78% of textual attributes.
We propose a novel approach, MUSE, to illustrate textual attributes visually via portrait generation. MUSE takes a set of attributes written in text, in addition to facial features extracted from a photo of the subject as input. We propose 11 attribute types to represent inspirations from a subject's profile, emotion, story, and environment. We propose a novel stacked neural network architecture by extending an image-to-image generative model to accept textual attributes. Experiments show that our approach significantly outperforms several state-of-the-art methods without using textual attributes, with Inception Score score increased by 6% and Fréchet Inception Distance (FID) score decreased by 11%, respectively. We also propose a new attribute reconstruction metric to evaluate whether the generated portraits preserve the subject's attributes. Experiments show that our approach can accurately illustrate 78% textual attributes, which also help MUSE capture the subject in a more creative and expressive way.