CVLGMar 6, 2018

The Contextual Loss for Image Transformation with Non-Aligned Data

arXiv:1803.02077v4416 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses a key bottleneck in image transformation tasks for computer vision researchers and practitioners by enabling training on non-aligned data, though it is incremental as it builds on existing loss function paradigms.

The paper tackles the problem of image transformation when aligned training pairs are unavailable by introducing a loss function that compares semantically similar regions without requiring spatial alignment, achieving effective style transfer between non-aligned images such as mapping eyes-to-eyes and mouth-to-mouth.

Feed-forward CNNs trained for image transformation problems rely on loss functions that measure the similarity between the generated image and a target image. Most of the common loss functions assume that these images are spatially aligned and compare pixels at corresponding locations. However, for many tasks, aligned training pairs of images will not be available. We present an alternative loss function that does not require alignment, thus providing an effective and simple solution for a new space of problems. Our loss is based on both context and semantics -- it compares regions with similar semantic meaning, while considering the context of the entire image. Hence, for example, when transferring the style of one face to another, it will translate eyes-to-eyes and mouth-to-mouth. Our code can be found at https://www.github.com/roimehrez/contextualLoss

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes