CVAIOct 11, 2022

Fine-Grained Image Style Transfer with Visual Transformers

Microsoft
arXiv:2210.05176v120 citationsh-index: 74
Originality Incremental advance
AI Analysis

This work addresses the limitation of global feature transformations in style transfer for computer vision applications, offering a more detailed approach, though it is incremental in nature.

The paper tackles the problem of fine-grained image style transfer by proposing a novel STyle TRansformer (STTR) network that uses visual tokens and attention mechanisms to preserve spatial information, resulting in improved style transfer results as validated by user studies with 50 subjects and 1,000 votes.

With the development of the convolutional neural network, image style transfer has drawn increasing attention. However, most existing approaches adopt a global feature transformation to transfer style patterns into content images (e.g., AdaIN and WCT). Such a design usually destroys the spatial information of the input images and fails to transfer fine-grained style patterns into style transfer results. To solve this problem, we propose a novel STyle TRansformer (STTR) network which breaks both content and style images into visual tokens to achieve a fine-grained style transformation. Specifically, two attention mechanisms are adopted in our STTR. We first propose to use self-attention to encode content and style tokens such that similar tokens can be grouped and learned together. We then adopt cross-attention between content and style tokens that encourages fine-grained style transformations. To compare STTR with existing approaches, we conduct user studies on Amazon Mechanical Turk (AMT), which are carried out with 50 human subjects with 1,000 votes in total. Extensive evaluations demonstrate the effectiveness and efficiency of the proposed STTR in generating visually pleasing style transfer results.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes