LGITFAOCMLFeb 26, 2021

Moreau-Yosida $f$-divergences

arXiv:2102.13416v26 citations
AI Analysis

This work addresses a theoretical bottleneck in f-divergence representations for machine learning practitioners, offering incremental improvements with practical implementations.

The paper tackles the problem of variational representations of f-divergences in machine learning by defining the Moreau-Yosida approximation with respect to the Wasserstein-1 metric, leading to a generalization of recent results and a relaxation of Lipschitz constraints, with practical applications in GANs trained on CIFAR-10 showing competitive results.

Variational representations of $f$-divergences are central to many machine learning algorithms, with Lipschitz constrained variants recently gaining attention. Inspired by this, we define the Moreau-Yosida approximation of $f$-divergences with respect to the Wasserstein-$1$ metric. The corresponding variational formulas provide a generalization of a number of recent results, novel special cases of interest and a relaxation of the hard Lipschitz constraint. Additionally, we prove that the so-called tight variational representation of $f$-divergences can be to be taken over the quotient space of Lipschitz functions, and give a characterization of functions achieving the supremum in the variational representation. On the practical side, we propose an algorithm to calculate the tight convex conjugate of $f$-divergences compatible with automatic differentiation frameworks. As an application of our results, we propose the Moreau-Yosida $f$-GAN, providing an implementation of the variational formulas for the Kullback-Leibler, reverse Kullback-Leibler, $χ^2$, reverse $χ^2$, squared Hellinger, Jensen-Shannon, Jeffreys, triangular discrimination and total variation divergences as GANs trained on CIFAR-10, leading to competitive results and a simple solution to the problem of uniqueness of the optimal critic.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes