CVOct 11, 2022

Style-Guided Inference of Transformer for High-resolution Image Synthesis

arXiv:2210.05533v1h-index: 3
Originality Incremental advance
AI Analysis

This addresses the need for more controlled and efficient image generation in computer vision, though it is incremental as it builds on existing transformer methods.

The paper tackles the unpredictability of high-resolution image synthesis with auto-regressive transformers by using a style image as an additional condition to guide sampling, resulting in generated samples that are similar to the reference style without retraining the model.

Transformer is eminently suitable for auto-regressive image synthesis which predicts discrete value from the past values recursively to make up full image. Especially, combined with vector quantised latent representation, the state-of-the-art auto-regressive transformer displays realistic high-resolution images. However, sampling the latent code from discrete probability distribution makes the output unpredictable. Therefore, it requires to generate lots of diverse samples to acquire desired outputs. To alleviate the process of generating lots of samples repetitively, in this article, we propose to take a desired output, a style image, as an additional condition without re-training the transformer. To this end, our method transfers the style to a probability constraint to re-balance the prior, thereby specifying the target distribution instead of the original prior. Thus, generated samples from the re-balanced prior have similar styles to reference style. In practice, we can choose either an image or a category of images as an additional condition. In our qualitative assessment, we show that styles of majority of outputs are similar to the input style.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes