CVAug 4, 2020

Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions

arXiv:2008.01576v242 citations
AI Analysis

This addresses the challenge of editing images in diverse domains without specific training data, which is incremental as it builds on existing visual-semantic embeddings.

The authors tackled the problem of open-domain image manipulation using open-vocabulary instructions, achieving promising results in manipulating color, texture, and high-level attributes across various scenarios.

We propose a novel algorithm, named Open-Edit, which is the first attempt on open-domain image manipulation with open-vocabulary instructions. It is a challenging task considering the large variation of image domains and the lack of training supervision. Our approach takes advantage of the unified visual-semantic embedding space pretrained on a general image-caption dataset, and manipulates the embedded visual features by applying text-guided vector arithmetic on the image feature maps. A structure-preserving image decoder then generates the manipulated images from the manipulated feature maps. We further propose an on-the-fly sample-specific optimization approach with cycle-consistency constraints to regularize the manipulated images and force them to preserve details of the source images. Our approach shows promising results in manipulating open-vocabulary color, texture, and high-level attributes for various scenarios of open-domain images.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes