CVAILGJul 17, 2023

Image Captions are Natural Prompts for Text-to-Image Models

arXiv:2307.08526v22 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses data-scarcity and privacy issues in AI-generated content for tasks like ImageNet classification, though it is incremental as it builds on existing captioning and generative models.

The paper tackles the challenge of generating informative synthetic training data for text-to-image models by using image captions as natural prompts, resulting in improved downstream model generalization and out-of-distribution robustness that can exceed real data.

With the rapid development of Artificial Intelligence Generated Content (AIGC), it has become a common practice to train models on synthetic data due to data-scarcity and privacy leakage problems. Owing to massive and diverse information conveyed in real images, it is challenging for text-to-image generative models to synthesize informative training data with hand-crafted prompts. Considering the impressive ability of large generative models, could such models directly synthesize good training images for prediction tasks with proper prompts? We offer an affirmative response to this question by proposing a simple yet effective method, validated through ImageNet classification. Specifically, we caption each real image with the advanced captioning model to obtain informative and faithful prompts that extract class-relevant information and clarify the polysemy of class names. The image captions and class names are concatenated to prompt generative models for training image synthesis. We show that this simple caption incorporation significantly boosts the informativeness of synthetic data therefore enhancing downstream model generalization. More importantly, besides improvements in data augmentation and privacy preservation, our experiments demonstrate that synthesized images can exceed real data in terms of out-of-distribution robustness.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes