CVJan 11, 2019

Image Disentanglement and Uncooperative Re-Entanglement for High-Fidelity Image-to-Image Translation

Adam W. Harley, Shih-En Wei, Jason Saragih, Katerina Fragkiadaki

arXiv:1901.03628v22.62 citations

Originality Incremental advance

AI Analysis

This addresses the need for reliable image translation in applications like augmented reality, where unpredictable changes are unacceptable, though it is incremental as it builds on adversarial cycle consistency methods.

The paper tackles the problem of high-fidelity image-to-image translation, where current methods unpredictably alter details that should be preserved, by introducing an optimization technique that prevents networks from cooperating, resulting in semantics-preserving translations that prior methods miss.

Cross-domain image-to-image translation should satisfy two requirements: (1) preserve the information that is common to both domains, and (2) generate convincing images covering variations that appear in the target domain. This is challenging, especially when there are no example translations available as supervision. Adversarial cycle consistency was recently proposed as a solution, with beautiful and creative results, yielding much follow-up work. However, augmented reality applications cannot readily use such techniques to provide users with compelling translations of real scenes, because the translations do not have high-fidelity constraints. In other words, current models are liable to change details that should be preserved: while re-texturing a face, they may alter the face's expression in an unpredictable way. In this paper, we introduce the problem of high-fidelity image-to-image translation, and present a method for solving it. Our main insight is that low-fidelity translations typically escape a cycle-consistency penalty, because the back-translator learns to compensate for the forward-translator's errors. We therefore introduce an optimization technique that prevents the networks from cooperating: simply train each network only when its input data is real. Prior works, in comparison, train each network with a mix of real and generated data. Experimental results show that our method accurately disentangles the factors that separate the domains, and converges to semantics-preserving translations that prior methods miss.

View on arXiv PDF

Similar