CVLGIVJul 20, 2020

Incorporating Reinforced Adversarial Learning in Autoregressive Image Generation

arXiv:2007.09923v115 citations
Originality Incremental advance
AI Analysis

This addresses the problem of generating high-quality images with autoregressive models for researchers and practitioners in computer vision, though it appears incremental as it builds on existing VQ-VAE frameworks.

The paper tackles limitations of autoregressive image generation models, such as exposure bias and lack of visual fidelity, by proposing Reinforced Adversarial Learning (RAL) based on policy gradient optimization, resulting in improved negative log-likelihood and Fréchet Inception Distance metrics, achieving state-of-the-art results on Celeba at 64x64 resolution.

Autoregressive models recently achieved comparable results versus state-of-the-art Generative Adversarial Networks (GANs) with the help of Vector Quantized Variational AutoEncoders (VQ-VAE). However, autoregressive models have several limitations such as exposure bias and their training objective does not guarantee visual fidelity. To address these limitations, we propose to use Reinforced Adversarial Learning (RAL) based on policy gradient optimization for autoregressive models. By applying RAL, we enable a similar process for training and testing to address the exposure bias issue. In addition, visual fidelity has been further optimized with adversarial loss inspired by their strong counterparts: GANs. Due to the slow sampling speed of autoregressive models, we propose to use partial generation for faster training. RAL also empowers the collaboration between different modules of the VQ-VAE framework. To our best knowledge, the proposed method is first to enable adversarial learning in autoregressive models for image generation. Experiments on synthetic and real-world datasets show improvements over the MLE trained models. The proposed method improves both negative log-likelihood (NLL) and Fréchet Inception Distance (FID), which indicates improvements in terms of visual quality and diversity. The proposed method achieves state-of-the-art results on Celeba for 64 $\times$ 64 image resolution, showing promise for large scale image generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes