CVDec 14, 2020

Semantic Layout Manipulation with High-Resolution Sparse Attention

arXiv:2012.07288v415 citations
AI Analysis

This work provides an incremental improvement in image manipulation quality for computer vision researchers and practitioners working with semantic layout editing.

This paper addresses semantic image layout manipulation, where an input image is edited based on a new semantic label map. The authors propose a high-resolution sparse attention module to transfer visual details at 512x512 resolution and a novel generator architecture, achieving substantial improvements over existing methods on ADE20k and Places365 datasets.

We tackle the problem of semantic image layout manipulation, which aims to manipulate an input image by editing its semantic label map. A core problem of this task is how to transfer visual details from the input images to the new semantic layout while making the resulting image visually realistic. Recent work on learning cross-domain correspondence has shown promising results for global layout transfer with dense attention-based warping. However, this method tends to lose texture details due to the resolution limitation and the lack of smoothness constraint of correspondence. To adapt this paradigm for the layout manipulation task, we propose a high-resolution sparse attention module that effectively transfers visual details to new layouts at a resolution up to 512x512. To further improve visual quality, we introduce a novel generator architecture consisting of a semantic encoder and a two-stage decoder for coarse-to-fine synthesis. Experiments on the ADE20k and Places365 datasets demonstrate that our proposed approach achieves substantial improvements over the existing inpainting and layout manipulation methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes