LGSTDec 1, 2020

Convergence and Sample Complexity of SGD in GANs

arXiv:2012.00732v13 citations
AI Analysis

This work provides foundational theoretical guarantees for the convergence and sample complexity of SGD in GANs, which is a significant problem for researchers and practitioners working with generative models.

This paper provides theoretical convergence guarantees for training Generative Adversarial Networks (GANs) using Stochastic Gradient Descent (SGD). It demonstrates that a 1-layer Generator network can learn a target distribution within a total-variation distance of \u03b5 using \u00d5(d^2/\u03b5^2) samples, which is near-information theoretically optimal.

We provide theoretical convergence guarantees on training Generative Adversarial Networks (GANs) via SGD. We consider learning a target distribution modeled by a 1-layer Generator network with a non-linear activation function $φ(\cdot)$ parametrized by a $d \times d$ weight matrix $\mathbf W_*$, i.e., $f_*(\mathbf x) = φ(\mathbf W_* \mathbf x)$. Our main result is that by training the Generator together with a Discriminator according to the Stochastic Gradient Descent-Ascent iteration proposed by Goodfellow et al. yields a Generator distribution that approaches the target distribution of $f_*$. Specifically, we can learn the target distribution within total-variation distance $ε$ using $\tilde O(d^2/ε^2)$ samples which is (near-)information theoretically optimal. Our results apply to a broad class of non-linear activation functions $φ$, including ReLUs and is enabled by a connection with truncated statistics and an appropriate design of the Discriminator network. Our approach relies on a bilevel optimization framework to show that vanilla SGDA works.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes