CVJun 1, 2024

Localize, Understand, Collaborate: Semantic-Aware Dragging via Intention Reasoner

arXiv:2406.00432v210 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of flexible and accurate image editing for users, though it appears incremental as it builds on existing drag-based editing methods.

The paper tackles the ill-posed nature and image quality issues in drag-based editing by proposing LucidDrag, which shifts from a 'how to drag' to a 'what-then-how' paradigm, resulting in superior performance over previous methods as shown in qualitative and quantitative comparisons.

Flexible and accurate drag-based editing is a challenging task that has recently garnered significant attention. Current methods typically model this problem as automatically learning "how to drag" through point dragging and often produce one deterministic estimation, which presents two key limitations: 1) Overlooking the inherently ill-posed nature of drag-based editing, where multiple results may correspond to a given input, as illustrated in Fig.1; 2) Ignoring the constraint of image quality, which may lead to unexpected distortion. To alleviate this, we propose LucidDrag, which shifts the focus from "how to drag" to "what-then-how" paradigm. LucidDrag comprises an intention reasoner and a collaborative guidance sampling mechanism. The former infers several optimal editing strategies, identifying what content and what semantic direction to be edited. Based on the former, the latter addresses "how to drag" by collaboratively integrating existing editing guidance with the newly proposed semantic guidance and quality guidance. Specifically, semantic guidance is derived by establishing a semantic editing direction based on reasoned intentions, while quality guidance is achieved through classifier guidance using an image fidelity discriminator. Both qualitative and quantitative comparisons demonstrate the superiority of LucidDrag over previous methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes