CVFeb 24, 2022

StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Translation

arXiv:2202.12362v153 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the lack of artistic control in text-to-image generation for users who want to specify drawing styles, though it is an incremental improvement over existing methods.

The paper tackles the problem of generating styled drawings from text descriptions by proposing StyleCLIPDraw, which optimizes for style and content simultaneously instead of sequentially, resulting in human evaluators strongly preferring its styles over sequential methods and overall preferring it for both content and style despite some content degradation.

Generating images that fit a given text description using machine learning has improved greatly with the release of technologies such as the CLIP image-text encoder model; however, current methods lack artistic control of the style of image to be generated. We present an approach for generating styled drawings for a given text description where a user can specify a desired drawing style using a sample image. Inspired by a theory in art that style and content are generally inseparable during the creative process, we propose a coupled approach, known here as StyleCLIPDraw, whereby the drawing is generated by optimizing for style and content simultaneously throughout the process as opposed to applying style transfer after creating content in a sequence. Based on human evaluation, the styles of images generated by StyleCLIPDraw are strongly preferred to those by the sequential approach. Although the quality of content generation degrades for certain styles, overall considering both content \textit{and} style, StyleCLIPDraw is found far more preferred, indicating the importance of style, look, and feel of machine generated images to people as well as indicating that style is coupled in the drawing process itself. Our code (https://github.com/pschaldenbrand/StyleCLIPDraw), a demonstration (https://replicate.com/pschaldenbrand/style-clip-draw), and style evaluation data (https://www.kaggle.com/pittsburghskeet/drawings-with-style-evaluation-styleclipdraw) are publicly available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes