LG AI GTJan 13, 2018

Which Training Methods for GANs do actually Converge?

Lars Mescheder, Andreas Geiger, Sebastian Nowozin

arXiv:1801.04406v447.91676 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses convergence issues in GAN training for practitioners, providing theoretical insights and practical regularization methods to stabilize learning, though it is incremental in refining existing approaches.

The paper demonstrates that unregularized GAN training fails to converge for non-absolutely continuous distributions, a realistic case, and identifies specific regularization strategies (instance noise, zero-centered gradient penalties) that ensure convergence, while others like Wasserstein-GANs may not. It extends these findings to more general GANs and applies them to learn high-resolution generative image models across datasets with minimal hyperparameter tuning.

Recent work has shown local convergence of GAN training for absolutely continuous data and generator distributions. In this paper, we show that the requirement of absolute continuity is necessary: we describe a simple yet prototypical counterexample showing that in the more realistic case of distributions that are not absolutely continuous, unregularized GAN training is not always convergent. Furthermore, we discuss regularization strategies that were recently proposed to stabilize GAN training. Our analysis shows that GAN training with instance noise or zero-centered gradient penalties converges. On the other hand, we show that Wasserstein-GANs and WGAN-GP with a finite number of discriminator updates per generator update do not always converge to the equilibrium point. We discuss these results, leading us to a new explanation for the stability problems of GAN training. Based on our analysis, we extend our convergence results to more general GANs and prove local convergence for simplified gradient penalties even if the generator and data distribution lie on lower dimensional manifolds. We find these penalties to work well in practice and use them to learn high-resolution generative image models for a variety of datasets with little hyperparameter tuning.

View on arXiv PDF Code

Similar