LGOCAug 21, 2024

Annealed Sinkhorn for Optimal Transport: convergence, regularization path and debiasing

arXiv:2408.11620v16 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses a theoretical gap in the convergence of Annealed Sinkhorn for practitioners in large-scale optimal transport, though it is incremental as it builds on existing methods.

The paper tackles the convergence and error trade-offs of Annealed Sinkhorn for optimal transport, showing that a concave annealing schedule asymptotically solves OT under specific conditions and identifying a relaxation error, with a proposed debiased algorithm achieving the full speed-accuracy Pareto front in experiments.

Sinkhorn's algorithm is a method of choice to solve large-scale optimal transport (OT) problems. In this context, it involves an inverse temperature parameter $β$ that determines the speed-accuracy trade-off. To improve this trade-off, practitioners often use a variant of this algorithm, Annealed Sinkhorn, that uses an nondecreasing sequence $(β_t)_{t\in \mathbb{N}}$ where $t$ is the iteration count. However, besides for the schedule $β_t=Θ(\log t)$ which is impractically slow, it is not known whether this variant is guaranteed to actually solve OT. Our first contribution answers this question: we show that a concave annealing schedule asymptotically solves OT if and only if $β_t\to+\infty$ and $β_t-β_{t-1}\to 0$. The proof is based on an equivalence with Online Mirror Descent and further suggests that the iterates of Annealed Sinkhorn follow the solutions of a sequence of relaxed, entropic OT problems, the regularization path. An analysis of this path reveals that, in addition to the well-known "entropic" error in $Θ(β^{-1}_t)$, the annealing procedure induces a "relaxation" error in $Θ(β_{t}-β_{t-1})$. The best error trade-off is achieved with the schedule $β_t = Θ(\sqrt{t})$ which, albeit slow, is a universal limitation of this method. Going beyond this limitation, we propose a simple modification of Annealed Sinkhorn that reduces the relaxation error, and therefore enables faster annealing schedules. In toy experiments, we observe the effectiveness of our Debiased Annealed Sinkhorn's algorithm: a single run of this algorithm spans the whole speed-accuracy Pareto front of the standard Sinkhorn's algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes