CVDec 5, 2022

CLIPVG: Text-Guided Image Manipulation Using Differentiable Vector Graphics

arXiv:2212.02122v252 citationsh-index: 22
Originality Highly original
AI Analysis

This addresses the challenge of fine-scale pixel-level changes in image manipulation for users in computer vision and graphics, offering a more flexible and high-quality solution.

The paper tackles the problem of text-guided image manipulation by introducing CLIPVG, a framework that uses differentiable vector graphics to achieve state-of-the-art performance in semantic correctness and synthesis quality without requiring additional generative models.

Considerable progress has recently been made in leveraging CLIP (Contrastive Language-Image Pre-Training) models for text-guided image manipulation. However, all existing works rely on additional generative models to ensure the quality of results, because CLIP alone cannot provide enough guidance information for fine-scale pixel-level changes. In this paper, we introduce CLIPVG, a text-guided image manipulation framework using differentiable vector graphics, which is also the first CLIP-based general image manipulation framework that does not require any additional generative models. We demonstrate that CLIPVG can not only achieve state-of-art performance in both semantic correctness and synthesis quality, but also is flexible enough to support various applications far beyond the capability of all existing methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes