LGAICVMLFeb 12, 2025

PoGDiff: Product-of-Gaussians Diffusion Models for Imbalanced Text-to-Image Generation

Georgia Tech
arXiv:2502.08106v30.021 citationsh-index: 21
AI Analysis90

This work addresses the problem of imbalanced datasets for diffusion models in text-to-image generation, which is significant for applications where minority data is underrepresented.

The authors tackled the problem of imbalanced datasets in diffusion models for text-to-image generation, achieving improved generation accuracy and quality. Their approach, PoGDiff, addresses the imbalance issue by replacing the ground-truth distribution with a Product of Gaussians.

Diffusion models have made significant advancements in recent years. However, their performance often deteriorates when trained or fine-tuned on imbalanced datasets. This degradation is largely due to the disproportionate representation of majority and minority data in image-text pairs. In this paper, we propose a general fine-tuning approach, dubbed PoGDiff, to address this challenge. Rather than directly minimizing the KL divergence between the predicted and ground-truth distributions, PoGDiff replaces the ground-truth distribution with a Product of Gaussians (PoG), which is constructed by combining the original ground-truth targets with the predicted distribution conditioned on a neighboring text embedding. Experiments on real-world datasets demonstrate that our method effectively addresses the imbalance problem in diffusion models, improving both generation accuracy and quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes