CVFeb 24

WildSVG: Towards Reliable SVG Generation Under Real-Word Conditions

arXiv:2602.21416v1h-index: 6
Originality Synthesis-oriented
AI Analysis

This addresses the need for reliable SVG extraction from natural images, which is incremental as it builds on existing multimodal models by providing a new benchmark.

The paper tackles the problem of generating scalable vector graphics (SVGs) from real-world images, which is challenging due to noise and domain shifts, and introduces the WildSVG Benchmark to evaluate models, finding that current approaches perform poorly in such conditions.

We introduce the task of SVG extraction, which consists in translating specific visual inputs from an image into scalable vector graphics. Existing multimodal models achieve strong results when generating SVGs from clean renderings or textual descriptions, but they fall short in real-world scenarios where natural images introduce noise, clutter, and domain shifts. A central challenge in this direction is the lack of suitable benchmarks. To address this need, we introduce the WildSVG Benchmark, formed by two complementary datasets: Natural WildSVG, built from real images containing company logos paired with their SVG annotations, and Synthetic WildSVG, which blends complex SVG renderings into real scenes to simulate difficult conditions. Together, these resources provide the first foundation for systematic benchmarking SVG extraction. We benchmark state-of-the-art multimodal models and find that current approaches perform well below what is needed for reliable SVG extraction in real scenarios. Nonetheless, iterative refinement methods point to a promising path forward, and model capabilities are steadily improving

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes