TransGaGa: Geometry-Aware Unsupervised Image-to-Image Translation
This addresses the challenge of translating images with complex geometry for computer vision applications, representing an incremental improvement over existing methods.
The paper tackles the problem of unsupervised image-to-image translation across large geometry variations, which often fails, by proposing a disentangle-and-translate framework that separates appearance and geometry latent spaces, resulting in superior performance in near-rigid and non-rigid translation tasks compared to state-of-the-art methods.
Unsupervised image-to-image translation aims at learning a mapping between two visual domains. However, learning a translation across large geometry variations always ends up with failure. In this work, we present a novel disentangle-and-translate framework to tackle the complex objects image-to-image translation task. Instead of learning the mapping on the image space directly, we disentangle image space into a Cartesian product of the appearance and the geometry latent spaces. Specifically, we first introduce a geometry prior loss and a conditional VAE loss to encourage the network to learn independent but complementary representations. The translation is then built on appearance and geometry space separately. Extensive experiments demonstrate the superior performance of our method to other state-of-the-art approaches, especially in the challenging near-rigid and non-rigid objects translation tasks. In addition, by taking different exemplars as the appearance references, our method also supports multimodal translation. Project page: https://wywu.github.io/projects/TGaGa/TGaGa.html