CVMay 28, 2021

MixerGAN: An MLP-Based Architecture for Unpaired Image-to-Image Translation

arXiv:2105.14110v221 citations
Originality Synthesis-oriented
AI Analysis

This work addresses computational bottlenecks in unpaired image-to-image translation for computer vision researchers, though it is incremental as it adapts an existing MLP-Mixer architecture to this task.

The paper tackles the computational inefficiency of attention-based transformers in image-to-image translation by proposing MixerGAN, an MLP-based architecture that achieves competitive results compared to prior convolutional methods.

While attention-based transformer networks achieve unparalleled success in nearly all language tasks, the large number of tokens (pixels) found in images coupled with the quadratic activation memory usage makes them prohibitive for problems in computer vision. As such, while language-to-language translation has been revolutionized by the transformer model, convolutional networks remain the de facto solution for image-to-image translation. The recently proposed MLP-Mixer architecture alleviates some of the computational issues associated with attention-based networks while still retaining the long-range connections that make transformer models desirable. Leveraging this memory-efficient alternative to self-attention, we propose a new exploratory model in unpaired image-to-image translation called MixerGAN: a simpler MLP-based architecture that considers long-distance relationships between pixels without the need for expensive attention mechanisms. Quantitative and qualitative analysis shows that MixerGAN achieves competitive results when compared to prior convolutional-based methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes