CVIVJun 4, 2021

The Image Local Autoregressive Transformer

arXiv:2106.02514v217 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of precise local image editing for applications like person image synthesis and face editing, representing an incremental improvement over existing autoregressive methods.

The paper tackles the problem of local image region editing with autoregressive models, which suffer from missing global information, slow inference, and information leakage, by proposing the image Local Autoregressive Transformer (iLAT) that learns local discrete representations to efficiently synthesize regions based on guidance, achieving efficacy in tasks like pose-guided person image synthesis and face editing.

Recently, AutoRegressive (AR) models for the whole image generation empowered by transformers have achieved comparable or even better performance to Generative Adversarial Networks (GANs). Unfortunately, directly applying such AR models to edit/change local image regions, may suffer from the problems of missing global information, slow inference speed, and information leakage of local guidance. To address these limitations, we propose a novel model -- image Local Autoregressive Transformer (iLAT), to better facilitate the locally guided image synthesis. Our iLAT learns the novel local discrete representations, by the newly proposed local autoregressive (LA) transformer of the attention mask and convolution mechanism. Thus iLAT can efficiently synthesize the local image regions by key guidance information. Our iLAT is evaluated on various locally guided image syntheses, such as pose-guided person image synthesis and face editing. Both the quantitative and qualitative results show the efficacy of our model.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes