CVLGJul 3, 2019

Mask Embedding in conditional GAN for Guided Synthesis of High Resolution Images

arXiv:1907.01710v14 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of generating high-quality, detailed images from semantic masks for applications like face synthesis, representing an incremental improvement in mask-guided image generation.

The paper tackles the problem of using semantic masks as guidance in conditional GANs for image synthesis, which often reduces variability and quality due to feature incompatibility, and proposes a mask embedding mechanism to improve this, achieving realistic high-resolution facial images up to 512x512 on the CELEBA-HQ dataset.

Recent advancements in conditional Generative Adversarial Networks (cGANs) have shown promises in label guided image synthesis. Semantic masks, such as sketches and label maps, are another intuitive and effective form of guidance in image synthesis. Directly incorporating the semantic masks as constraints dramatically reduces the variability and quality of the synthesized results. We observe this is caused by the incompatibility of features from different inputs (such as mask image and latent vector) of the generator. To use semantic masks as guidance whilst providing realistic synthesized results with fine details, we propose to use mask embedding mechanism to allow for a more efficient initial feature projection in the generator. We validate the effectiveness of our approach by training a mask guided face generator using CELEBA-HQ dataset. We can generate realistic and high resolution facial images up to the resolution of 512*512 with a mask guidance. Our code is publicly available.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes