CVNov 28, 2024

LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair

arXiv:2411.19156v34 citationsh-index: 26
Originality Incremental advance
AI Analysis

This addresses the challenge of ambiguous natural language instructions in image editing for users, though it is incremental as it builds on existing LoRA methods.

The paper tackles the problem of image editing with visual instructions by proposing the LoRA of Change framework, which learns an instruction-specific LoRA from a single before-after image pair, resulting in high-quality images that align with user intent and support a broad spectrum of real-world instructions.

In this paper, we propose the LoRA of Change (LoC) framework for image editing with visual instructions, i.e., before-after image pairs. Compared to the ambiguities, insufficient specificity, and diverse interpretations of natural language, visual instructions can accurately reflect users' intent. Building on the success of LoRA in text-based image editing and generation, we dynamically learn an instruction-specific LoRA to encode the "change" in a before-after image pair, enhancing the interpretability and reusability of our model. Furthermore, generalizable models for image editing with visual instructions typically require quad data, i.e., a before-after image pair, along with query and target images. Due to the scarcity of such quad data, existing models are limited to a narrow range of visual instructions. To overcome this limitation, we introduce the LoRA Reverse optimization technique, enabling large-scale training with paired data alone. Extensive qualitative and quantitative experiments demonstrate that our model produces high-quality images that align with user intent and support a broad spectrum of real-world visual instructions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes