CVMar 26, 2025

EditCLIP: Representation Learning for Image Editing

arXiv:2503.20318v11 citationsh-index: 12
Originality Incremental advance
AI Analysis

This addresses the need for more efficient and versatile image editing tools for users in computer vision and graphics, though it is incremental as it builds on existing CLIP-based methods.

The paper tackles the problem of image editing by introducing EditCLIP, a representation-learning approach that learns unified representations from image pairs to capture edits, and it outperforms state-of-the-art methods in exemplar-based editing and aligns better with human judgments in automated evaluation.

We introduce EditCLIP, a novel representation-learning approach for image editing. Our method learns a unified representation of edits by jointly encoding an input image and its edited counterpart, effectively capturing their transformation. To evaluate its effectiveness, we employ EditCLIP to solve two tasks: exemplar-based image editing and automated edit evaluation. In exemplar-based image editing, we replace text-based instructions in InstructPix2Pix with EditCLIP embeddings computed from a reference exemplar image pair. Experiments demonstrate that our approach outperforms state-of-the-art methods while being more efficient and versatile. For automated evaluation, EditCLIP assesses image edits by measuring the similarity between the EditCLIP embedding of a given image pair and either a textual editing instruction or the EditCLIP embedding of another reference image pair. Experiments show that EditCLIP aligns more closely with human judgments than existing CLIP-based metrics, providing a reliable measure of edit quality and structural preservation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes