CVFeb 12, 2025

SwiftSketch: A Diffusion Model for Image-to-Vector Sketch Generation

Ellie Arar, Yarden Frenkel, Daniel Cohen-Or, Ariel Shamir, Yael Vinker

arXiv:2502.08642v119.017 citationsh-index: 13SIGGRAPH

Originality Incremental advance

AI Analysis

This work solves a practical bottleneck for artists and designers by enabling fast, high-quality sketch generation, though it is incremental as it builds on diffusion models and synthetic data techniques.

The paper tackles the problem of slow vector sketch generation from images by introducing SwiftSketch, a diffusion model that produces high-quality sketches in less than a second, addressing the time-consuming optimization in existing methods.

Recent advancements in large vision-language models have enabled highly expressive and diverse vector sketch generation. However, state-of-the-art methods rely on a time-consuming optimization process involving repeated feedback from a pretrained model to determine stroke placement. Consequently, despite producing impressive sketches, these methods are limited in practical applications. In this work, we introduce SwiftSketch, a diffusion model for image-conditioned vector sketch generation that can produce high-quality sketches in less than a second. SwiftSketch operates by progressively denoising stroke control points sampled from a Gaussian distribution. Its transformer-decoder architecture is designed to effectively handle the discrete nature of vector representation and capture the inherent global dependencies between strokes. To train SwiftSketch, we construct a synthetic dataset of image-sketch pairs, addressing the limitations of existing sketch datasets, which are often created by non-artists and lack professional quality. For generating these synthetic sketches, we introduce ControlSketch, a method that enhances SDS-based techniques by incorporating precise spatial control through a depth-aware ControlNet. We demonstrate that SwiftSketch generalizes across diverse concepts, efficiently producing sketches that combine high fidelity with a natural and visually appealing style.

View on arXiv PDF

Similar