Attributed Synthetic Data Generation for Zero-shot Domain-specific Image Classification
This work addresses the challenge of classifying images in specialized domains without labeled training data, offering an incremental improvement over existing synthetic data generation methods.
The paper tackles the problem of zero-shot domain-specific image classification by generating more diverse synthetic training images using attributed prompts from large language models, resulting in significant performance improvements over CLIP and simple prompt strategies on two fine-grained datasets.
Zero-shot domain-specific image classification is challenging in classifying real images without ground-truth in-domain training examples. Recent research involved knowledge from texts with a text-to-image model to generate in-domain training images in zero-shot scenarios. However, existing methods heavily rely on simple prompt strategies, limiting the diversity of synthetic training images, thus leading to inferior performance compared to real images. In this paper, we propose AttrSyn, which leverages large language models to generate attributed prompts. These prompts allow for the generation of more diverse attributed synthetic images. Experiments for zero-shot domain-specific image classification on two fine-grained datasets show that training with synthetic images generated by AttrSyn significantly outperforms CLIP's zero-shot classification under most situations and consistently surpasses simple prompt strategies.