CVFeb 22, 2024

Semantic Image Synthesis with Unconditional Generator

arXiv:2402.14395v16 citationsh-index: 8NIPS
Originality Incremental advance
AI Analysis

This work addresses the need for more data-efficient and flexible image synthesis for researchers and practitioners in computer vision, though it is incremental as it builds on existing generator models.

The paper tackles the problem of semantic image synthesis without requiring large semantic segmentation datasets by using a pre-trained unconditional generator and rearranging its feature maps based on proxy masks derived from clustering. The method achieves versatility across applications like spatial editing, sketch-to-photo, and scribble-to-photo, as validated on datasets including human faces, animal faces, and buildings.

Semantic image synthesis (SIS) aims to generate realistic images that match given semantic masks. Despite recent advances allowing high-quality results and precise spatial control, they require a massive semantic segmentation dataset for training the models. Instead, we propose to employ a pre-trained unconditional generator and rearrange its feature maps according to proxy masks. The proxy masks are prepared from the feature maps of random samples in the generator by simple clustering. The feature rearranger learns to rearrange original feature maps to match the shape of the proxy masks that are either from the original sample itself or from random samples. Then we introduce a semantic mapper that produces the proxy masks from various input conditions including semantic masks. Our method is versatile across various applications such as free-form spatial editing of real images, sketch-to-photo, and even scribble-to-photo. Experiments validate advantages of our method on a range of datasets: human faces, animal faces, and buildings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes