Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models
This addresses the need for automated concept discovery in image analysis, offering a novel unsupervised method that is incremental in applying text-to-image models to new tasks.
The paper tackles the inverse problem of discovering generative concepts from image collections without supervision, enabling accurate representation, recombination for new images, and use in classification tasks across domains like art styles and kitchen scenes.
Text-to-image generative models have enabled high-resolution image synthesis across different domains, but require users to specify the content they wish to generate. In this paper, we consider the inverse problem -- given a collection of different images, can we discover the generative concepts that represent each image? We present an unsupervised approach to discover generative concepts from a collection of images, disentangling different art styles in paintings, objects, and lighting from kitchen scenes, and discovering image classes given ImageNet images. We show how such generative concepts can accurately represent the content of images, be recombined and composed to generate new artistic and hybrid images, and be further used as a representation for downstream classification tasks.