CVJun 3, 2025

RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers

arXiv:2506.02528v122 citationsh-index: 4
Originality Highly original
AI Analysis

This work addresses a domain-specific problem in computer vision for image editing, offering a novel method to enhance visual transformation transfer.

The paper tackles the problem of generalizable visual prompt-based image editing, particularly for non-rigid transformations, by proposing RelationAdapter, a lightweight module that enables Diffusion Transformer models to capture and apply visual transformations from source-target image pairs. Experiments on the Relation252K dataset show that RelationAdapter significantly improves generation quality and editing performance.

Inspired by the in-context learning mechanism of large language models (LLMs), a new paradigm of generalizable visual prompt-based image editing is emerging. Existing single-reference methods typically focus on style or appearance adjustments and struggle with non-rigid transformations. To address these limitations, we propose leveraging source-target image pairs to extract and transfer content-aware editing intent to novel query images. To this end, we introduce RelationAdapter, a lightweight module that enables Diffusion Transformer (DiT) based models to effectively capture and apply visual transformations from minimal examples. We also introduce Relation252K, a comprehensive dataset comprising 218 diverse editing tasks, to evaluate model generalization and adaptability in visual prompt-driven scenarios. Experiments on Relation252K show that RelationAdapter significantly improves the model's ability to understand and transfer editing intent, leading to notable gains in generation quality and overall editing performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes