Depth and Feature Learning are Provably Beneficial for Neural Network Discriminators
This provides theoretical proof for why deep GAN discriminators and feature learning are effective, addressing a foundational issue in machine learning theory.
The paper tackles the problem of proving the benefits of depth and feature learning in neural network discriminators by constructing distributions where deep or feature-learning networks achieve polynomial separation while shallow or fixed-kernel networks fail with exponential decline, showing concrete separation bounds like Ω(1/d²) and Ω(1/(d log d)).
We construct pairs of distributions $μ_d, ν_d$ on $\mathbb{R}^d$ such that the quantity $|\mathbb{E}_{x \sim μ_d} [F(x)] - \mathbb{E}_{x \sim ν_d} [F(x)]|$ decreases as $Ω(1/d^2)$ for some three-layer ReLU network $F$ with polynomial width and weights, while declining exponentially in $d$ if $F$ is any two-layer network with polynomial weights. This shows that deep GAN discriminators are able to distinguish distributions that shallow discriminators cannot. Analogously, we build pairs of distributions $μ_d, ν_d$ on $\mathbb{R}^d$ such that $|\mathbb{E}_{x \sim μ_d} [F(x)] - \mathbb{E}_{x \sim ν_d} [F(x)]|$ decreases as $Ω(1/(d\log d))$ for two-layer ReLU networks with polynomial weights, while declining exponentially for bounded-norm functions in the associated RKHS. This confirms that feature learning is beneficial for discriminators. Our bounds are based on Fourier transforms.