Mask CycleGAN: Unpaired Multi-modal Domain Translation with Interpretable Latent Variable
This work addresses domain translation issues for computer vision researchers, but it is incremental as it builds upon CycleGAN.
The paper tackled the problems of unimodality and lack of interpretability in unpaired image domain translation by proposing Mask CycleGAN, which introduces variations in generated images controllably and shows robustness to different masks.
We propose Mask CycleGAN, a novel architecture for unpaired image domain translation built based on CycleGAN, with an aim to address two issues: 1) unimodality in image translation and 2) lack of interpretability of latent variables. Our innovation in the technical approach is comprised of three key components: masking scheme, generator and objective. Experimental results demonstrate that this architecture is capable of bringing variations to generated images in a controllable manner and is reasonably robust to different masks.