Non-saturating GAN training as divergence minimization
This provides theoretical justification for a widely used but poorly understood GAN training method, addressing a gap in the literature.
The paper shows that non-saturating GAN training approximately minimizes a specific f-divergence, similar to reverse KL, explaining its empirical tendency for high sample quality but poor diversity.
Non-saturating generative adversarial network (GAN) training is widely used and has continued to obtain groundbreaking results. However so far this approach has lacked strong theoretical justification, in contrast to alternatives such as f-GANs and Wasserstein GANs which are motivated in terms of approximate divergence minimization. In this paper we show that non-saturating GAN training does in fact approximately minimize a particular f-divergence. We develop general theoretical tools to compare and classify f-divergences and use these to show that the new f-divergence is qualitatively similar to reverse KL. These results help to explain the high sample quality but poor diversity often observed empirically when using this scheme.