Improving Shape Deformation in Unsupervised Image-to-Image Translation
This addresses a specific limitation in image-to-image translation for domains requiring shape changes, but it is incremental as it builds on existing unsupervised methods.
The paper tackled the problem of shape deformation in unsupervised image-to-image translation, which often fails with large shape changes, and introduced a discriminator with dilated convolutions and a multi-scale perceptual loss to improve context awareness and shape representation, demonstrating effectiveness on challenging datasets including humans, dolls, anime faces, cats, and dogs.
Unsupervised image-to-image translation techniques are able to map local texture between two domains, but they are typically unsuccessful when the domains require larger shape change. Inspired by semantic segmentation, we introduce a discriminator with dilated convolutions that is able to use information from across the entire image to train a more context-aware generator. This is coupled with a multi-scale perceptual loss that is better able to represent error in the underlying shape of objects. We demonstrate that this design is more capable of representing shape deformation in a challenging toy dataset, plus in complex mappings with significant dataset variation between humans, dolls, and anime faces, and between cats and dogs.