CVNov 19, 2018

Show, Attend and Translate: Unpaired Multi-Domain Image-to-Image Translation with Visual Attention

arXiv:1811.07483v26 citations
Originality Incremental advance
AI Analysis

This work addresses image translation for multiple domains without paired data, offering an explainable approach for applications like facial attribute editing, though it is incremental as it builds on existing GAN and attention techniques.

The paper tackles unpaired multi-domain image-to-image translation by proposing SAT, a GAN with visual attention that uses an action vector to treat translation as arithmetic operations, focusing on relevant image regions. Experiments on a facial attribute dataset show it outperforms other methods and generates interpretable attention masks.

Recently unpaired multi-domain image-to-image translation has attracted great interests and obtained remarkable progress, where a label vector is utilized to indicate multi-domain information. In this paper, we propose SAT (Show, Attend and Translate), an unified and explainable generative adversarial network equipped with visual attention that can perform unpaired image-to-image translation for multiple domains. By introducing an action vector, we treat the original translation tasks as problems of arithmetic addition and subtraction. Visual attention is applied to guarantee that only the regions relevant to the target domains are translated. Extensive experiments on a facial attribute dataset demonstrate the superiority of our approach and the generated attention masks better explain what SAT attends when translating images.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes