CVMay 1, 2025

InstructAttribute: Fine-grained Object Attributes editing with Instruction

arXiv:2505.00751v22 citationsh-index: 3
AI Analysis

This work addresses a specific problem in image editing for applications like product design and e-commerce, representing an incremental advancement by building on existing diffusion and large language model techniques.

The paper tackles the challenge of achieving fine-grained control over specific object attributes like color and material in text-to-image diffusion models, introducing InstructAttribute, an instruction-tuned model that enables precise attribute editing through natural language prompts and outperforms existing baselines in attribute modification accuracy and structural preservation.

Text-to-image (T2I) diffusion models are widely used in image editing due to their powerful generative capabilities. However, achieving fine-grained control over specific object attributes, such as color and material, remains a considerable challenge. Existing methods often fail to accurately modify these attributes or compromise structural integrity and overall image consistency. To fill this gap, we introduce Structure Preservation and Attribute Amplification (SPAA), a novel training-free framework that enables precise generation of color and material attributes for the same object by intelligently manipulating self-attention maps and cross-attention values within diffusion models. Building on SPAA, we integrate multi-modal large language models (MLLMs) to automate data curation and instruction generation. Leveraging this object attribute data collection engine, we construct the Attribute Dataset, encompassing a comprehensive range of colors and materials across diverse object categories. Using this generated dataset, we propose InstructAttribute, an instruction-tuned model that enables fine-grained and object-level attribute editing through natural language prompts. This capability holds significant practical implications for diverse fields, from accelerating product design and e-commerce visualization to enhancing virtual try-on experiences. Extensive experiments demonstrate that InstructAttribute outperforms existing instruction-based baselines, achieving a superior balance between attribute modification accuracy and structural preservation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes