CVAug 5, 2025

LORE: Latent Optimization for Precise Semantic Control in Rectified Flow-based Image Editing

arXiv:2508.03144v210.23 citationsh-index: 2Has Code

Originality Highly original

AI Analysis

This addresses a structural limitation in text-driven image editing for users needing precise control, offering a scalable solution without model modifications.

The paper tackles the problem of semantic bias in rectified flow-based image editing, where inverted noise suppresses attention to target concepts, leading to editing failures. LORE, a training-free method that optimizes inverted noise, significantly outperforms baselines on three benchmarks in semantic alignment, image quality, and background fidelity.

Text-driven image editing enables users to flexibly modify visual content through natural language instructions, and is widely applied to tasks such as semantic object replacement, insertion, and removal. While recent inversion-based editing methods using rectified flow models have achieved promising results in image quality, we identify a structural limitation in their editing behavior: the semantic bias toward the source concept encoded in the inverted noise tends to suppress attention to the target concept. This issue becomes particularly critical when the source and target semantics are dissimilar, where the attention mechanism inherently leads to editing failure or unintended modifications in non-target regions. In this paper, we systematically analyze and validate this structural flaw, and introduce LORE, a training-free and efficient image editing method. LORE directly optimizes the inverted noise, addressing the core limitations in generalization and controllability of existing approaches, enabling stable, controllable, and general-purpose concept replacement, without requiring architectural modification or model fine-tuning. We conduct comprehensive evaluations on three challenging benchmarks: PIEBench, SmartEdit, and GapEdit. Experimental results show that LORE significantly outperforms strong baselines in terms of semantic alignment, image quality, and background fidelity, demonstrating the effectiveness and scalability of latent-space optimization for general-purpose image editing. Our implementation is available at https://github.com/oyly16/LORE.

View on arXiv PDF Code

Similar