Wasserstein GANs Work Because They Fail (to Approximate the Wasserstein Distance)
This work addresses a fundamental misunderstanding in generative modeling for researchers and practitioners, revealing that WGANs' effectiveness stems from an approximation failure rather than theoretical correctness.
The paper tackles the discrepancy between the theoretical foundation of Wasserstein GANs and their practical training, finding that the WGAN loss does not meaningfully approximate the Wasserstein distance and that this distance is not ideal for generative models, attributing WGANs' success to this failure.
Wasserstein GANs are based on the idea of minimising the Wasserstein distance between a real and a generated distribution. We provide an in-depth mathematical analysis of differences between the theoretical setup and the reality of training Wasserstein GANs. In this work, we gather both theoretical and empirical evidence that the WGAN loss is not a meaningful approximation of the Wasserstein distance. Moreover, we argue that the Wasserstein distance is not even a desirable loss function for deep generative models, and conclude that the success of Wasserstein GANs can in truth be attributed to a failure to approximate the Wasserstein distance.