Multi-Class Multi-Instance Count Conditioned Adversarial Image Generation
This work addresses the need for fine-grained control in image generation for applications like synthetic data creation, but it is incremental as it builds upon existing StyleGAN2 architecture.
The paper tackles the problem of generating images with a specified number of objects from multiple classes, proposing a conditional GAN that extends StyleGAN2 with count-based conditioning and a regression sub-network for counting objects during training. The result is a model that learns to generate images according to complex count conditions, as demonstrated on three datasets including a new challenging CityCount dataset derived from Cityscapes.
Image generation has rapidly evolved in recent years. Modern architectures for adversarial training allow to generate even high resolution images with remarkable quality. At the same time, more and more effort is dedicated towards controlling the content of generated images. In this paper, we take one further step in this direction and propose a conditional generative adversarial network (GAN) that generates images with a defined number of objects from given classes. This entails two fundamental abilities (1) being able to generate high-quality images given a complex constraint and (2) being able to count object instances per class in a given image. Our proposed model modularly extends the successful StyleGAN2 architecture with a count-based conditioning as well as with a regression sub-network to count the number of generated objects per class during training. In experiments on three different datasets, we show that the proposed model learns to generate images according to the given multiple-class count condition even in the presence of complex backgrounds. In particular, we propose a new dataset, CityCount, which is derived from the Cityscapes street scenes dataset, to evaluate our approach in a challenging and practically relevant scenario.