CVJan 2, 2022

Splicing ViT Features for Semantic Appearance Transfer

arXiv:2201.00424v1260 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of appearance transfer for image editing applications, though it is incremental as it builds on existing Vision Transformer models.

The paper tackles the problem of semantically transferring visual appearance between images by generating an image where objects from a source structure image are painted with the appearance of semantically related objects from a target image, achieving high-resolution results without adversarial training or additional inputs like segmentation.

We present a method for semantically transferring the visual appearance of one natural image to another. Specifically, our goal is to generate an image in which objects in a source structure image are "painted" with the visual appearance of their semantically related objects in a target appearance image. Our method works by training a generator given only a single structure/appearance image pair as input. To integrate semantic information into our framework - a pivotal component in tackling this task - our key idea is to leverage a pre-trained and fixed Vision Transformer (ViT) model which serves as an external semantic prior. Specifically, we derive novel representations of structure and appearance extracted from deep ViT features, untwisting them from the learned self-attention modules. We then establish an objective function that splices the desired structure and appearance representations, interweaving them together in the space of ViT features. Our framework, which we term "Splice", does not involve adversarial training, nor does it require any additional input information such as semantic segmentation or correspondences, and can generate high-resolution results, e.g., work in HD. We demonstrate high quality results on a variety of in-the-wild image pairs, under significant variations in the number of objects, their pose and appearance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes