CVSep 7, 2023

Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis

Jiapeng Zhu, Ceyuan Yang, Kecheng Zheng, Yinghao Xu, Zifan Shi, Yujun Shen

arXiv:2309.03904v112.617 citationsh-index: 34Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of computational scaling for GANs in text-to-image synthesis, offering a domain-specific solution that is incremental by applying MoE to an existing framework.

The paper tackles the challenge of scaling generative adversarial networks (GANs) for text-conditioned image synthesis by introducing Aurora, a GAN-based model that uses a sparsely-activated mixture-of-experts (MoE) to improve efficiency and performance, achieving a zero-shot FID of 6.2 on MS COCO at 64x64 resolution.

Due to the difficulty in scaling up, generative adversarial networks (GANs) seem to be falling from grace on the task of text-conditioned image synthesis. Sparsely-activated mixture-of-experts (MoE) has recently been demonstrated as a valid solution to training large-scale models with limited computational resources. Inspired by such a philosophy, we present Aurora, a GAN-based text-to-image generator that employs a collection of experts to learn feature processing, together with a sparse router to help select the most suitable expert for each feature point. To faithfully decode the sampling stochasticity and the text condition to the final synthesis, our router adaptively makes its decision by taking into account the text-integrated global latent code. At 64x64 image resolution, our model trained on LAION2B-en and COYO-700M achieves 6.2 zero-shot FID on MS COCO. We release the code and checkpoints to facilitate the community for further development.

View on arXiv PDF Code

Similar