Statistical Guarantees of Group-Invariant GANs
It provides the first statistical guarantees for group-invariant GANs, addressing data efficiency in domains like images and molecules, but is incremental as it builds on existing empirical studies.
This work tackles the problem of learning group-invariant distributions with generative adversarial networks (GANs) by incorporating symmetries into the model, resulting in a reduction in sample complexity by a factor of the group size and a lower discriminator approximation error, as supported by numerical results.
This work presents the first statistical performance guarantees for group-invariant generative models. Many real data, such as images and molecules, are invariant to certain group symmetries, which can be taken advantage of to learn more efficiently as we rigorously demonstrate in this work. Here we specifically study generative adversarial networks (GANs), and quantify the gains when incorporating symmetries into the model. Group-invariant GANs are a type of GANs in which the generators and discriminators are hardwired with group symmetries. Empirical studies have shown that these networks are capable of learning group-invariant distributions with significantly improved data efficiency. In this study, we aim to rigorously quantify this improvement by analyzing the reduction in sample complexity and in the discriminator approximation error for group-invariant GANs. Our findings indicate that when learning group-invariant distributions, the number of samples required for group-invariant GANs decreases proportionally by a factor of the group size and the discriminator approximation error has a reduced lower bound. Importantly, the overall error reduction cannot be achieved merely through data augmentation on the training data. Numerical results substantiate our theory and highlight the stark contrast between learning with group-invariant GANs and using data augmentation. This work also sheds light on the study of other generative models with group symmetries, such as score-based generative models.