CV AIFeb 6

Di3PO - Diptych Diffusion DPO for Targeted Improvements in Image Generation

Sanjana Reddy, Ishaan Malhi, Sally Ma, Praneet Dutta

arXiv:2602.06355v2h-index: 11

AI Analysis

This addresses computational efficiency and training quality issues in preference tuning for text-to-image generation models, though it appears incremental rather than paradigm-shifting.

The paper tackles the problem of inefficient preference tuning in text-to-image diffusion models by introducing Di3PO, a method that constructs targeted positive/negative pairs to improve specific image regions while keeping context stable. The approach demonstrates improved text rendering performance over baseline methods like SFT and DPO.

Existing methods for preference tuning of text-to-image (T2I) diffusion models often rely on computationally expensive generation steps to create positive and negative pairs of images. These approaches frequently yield training pairs that either lack meaningful differences, are expensive to sample and filter, or exhibit significant variance in irrelevant pixel regions, thereby degrading training efficiency. To address these limitations, we introduce "Di3PO", a novel method for constructing positive and negative pairs that isolates specific regions targeted for improvement during preference tuning, while keeping the surrounding context in the image stable. We demonstrate the efficacy of our approach by applying it to the challenging task of text rendering in diffusion models, showcasing improvements over baseline methods of SFT and DPO.

View on arXiv PDF

Similar