Personalized Image Editing in Text-to-Image Diffusion Models via Collaborative Direct Preference Optimization
This addresses the problem of personalized image editing for users of text-to-image models, offering a novel method that is incremental in enhancing existing diffusion models with collaborative personalization.
The paper tackles the problem of generic text-to-image diffusion models failing to adapt to individual user aesthetic preferences by introducing Collaborative Direct Preference Optimization (C-DPO), a framework that aligns image edits with user-specific preferences using collaborative signals from like-minded individuals, resulting in consistent outperformance over baselines in generating preference-aligned edits.
Text-to-image (T2I) diffusion models have made remarkable strides in generating and editing high-fidelity images from text. Yet, these models remain fundamentally generic, failing to adapt to the nuanced aesthetic preferences of individual users. In this work, we present the first framework for personalized image editing in diffusion models, introducing Collaborative Direct Preference Optimization (C-DPO), a novel method that aligns image edits with user-specific preferences while leveraging collaborative signals from like-minded individuals. Our approach encodes each user as a node in a dynamic preference graph and learns embeddings via a lightweight graph neural network, enabling information sharing across users with overlapping visual tastes. We enhance a diffusion model's editing capabilities by integrating these personalized embeddings into a novel DPO objective, which jointly optimizes for individual alignment and neighborhood coherence. Comprehensive experiments, including user studies and quantitative benchmarks, demonstrate that our method consistently outperforms baselines in generating edits that are aligned with user preferences.