CVIVSep 24, 2023

MOSAIC: Multi-Object Segmented Arbitrary Stylization Using CLIP

arXiv:2309.13716v14 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses a limitation in creative image stylization for users who need precise regional control, representing an incremental improvement over existing methods.

The paper tackles the problem of lacking fine control over stylizing individual objects in text-driven style transfer by proposing MOSAIC, a method that segments and stylizes objects based on input prompts, producing high-quality images with enhanced control and generalization to unseen classes.

Style transfer driven by text prompts paved a new path for creatively stylizing the images without collecting an actual style image. Despite having promising results, with text-driven stylization, the user has no control over the stylization. If a user wants to create an artistic image, the user requires fine control over the stylization of various entities individually in the content image, which is not addressed by the current state-of-the-art approaches. On the other hand, diffusion style transfer methods also suffer from the same issue because the regional stylization control over the stylized output is ineffective. To address this problem, We propose a new method Multi-Object Segmented Arbitrary Stylization Using CLIP (MOSAIC), that can apply styles to different objects in the image based on the context extracted from the input prompt. Text-based segmentation and stylization modules which are based on vision transformer architecture, were used to segment and stylize the objects. Our method can extend to any arbitrary objects, styles and produce high-quality images compared to the current state of art methods. To our knowledge, this is the first attempt to perform text-guided arbitrary object-wise stylization. We demonstrate the effectiveness of our approach through qualitative and quantitative analysis, showing that it can generate visually appealing stylized images with enhanced control over stylization and the ability to generalize to unseen object classes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes