MLLGOct 21, 2019

Kernelized Wasserstein Natural Gradient

arXiv:1910.09652v423 citations
AI Analysis

This work addresses a computational bottleneck in natural gradient methods for machine learning practitioners, offering an incremental improvement with empirical validation on specific datasets.

The paper tackles the challenge of computing the natural gradient in optimization problems over probability distributions by proposing a kernel-based framework to approximate it for the Wasserstein metric, resulting in a computationally efficient estimator that improves classification performance on Cifar10 and Cifar100.

Many machine learning problems can be expressed as the optimization of some cost functional over a parametric family of probability distributions. It is often beneficial to solve such optimization problems using natural gradient methods. These methods are invariant to the parametrization of the family, and thus can yield more effective optimization. Unfortunately, computing the natural gradient is challenging as it requires inverting a high dimensional matrix at each iteration. We propose a general framework to approximate the natural gradient for the Wasserstein metric, by leveraging a dual formulation of the metric restricted to a Reproducing Kernel Hilbert Space. Our approach leads to an estimator for gradient direction that can trade-off accuracy and computational cost, with theoretical guarantees. We verify its accuracy on simple examples, and show the advantage of using such an estimator in classification tasks on Cifar10 and Cifar100 empirically.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes