MLLGMay 30, 2018

Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance

arXiv:1805.11897v1143 citations
Originality Incremental advance
AI Analysis

This work addresses the need for accurate and differentiable Wasserstein approximations in machine learning applications, offering theoretical guarantees and practical efficiency.

The paper tackles the problem of using the Sinkhorn approximation for Wasserstein distance in learning tasks, which is often replaced by a less accurate but differentiable version, by proving that the original Sinkhorn distance has the same smoothness and providing an efficient gradient computation algorithm, with promising preliminary experiments.

Applications of optimal transport have recently gained remarkable attention thanks to the computational advantages of entropic regularization. However, in most situations the Sinkhorn approximation of the Wasserstein distance is replaced by a regularized version that is less accurate but easy to differentiate. In this work we characterize the differential properties of the original Sinkhorn distance, proving that it enjoys the same smoothness as its regularized version and we explicitly provide an efficient algorithm to compute its gradient. We show that this result benefits both theory and applications: on one hand, high order smoothness confers statistical guarantees to learning with Wasserstein approximations. On the other hand, the gradient formula allows us to efficiently solve learning and optimization problems in practice. Promising preliminary experiments complement our analysis.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes