Linear Time Sinkhorn Divergences using Positive Features
This work addresses computational bottlenecks in optimal transport for data scientists, offering a practical speed-up for applications like generative modeling, though it is incremental as it builds on existing Sinkhorn and low-rank approximation methods.
The paper tackles the high computational cost of Sinkhorn divergences, which scale quadratically with distribution size, by proposing a method using positive feature maps to reduce complexity to linear time, achieving Sinkhorn iterations that scale as O(nr) with r << n, and demonstrates this by training a faster variant of OT-GAN.
Although Sinkhorn divergences are now routinely used in data sciences to compare probability distributions, the computational effort required to compute them remains expensive, growing in general quadratically in the size $n$ of the support of these distributions. Indeed, solving optimal transport (OT) with an entropic regularization requires computing a $n\times n$ kernel matrix (the neg-exponential of a $n\times n$ pairwise ground cost matrix) that is repeatedly applied to a vector. We propose to use instead ground costs of the form $c(x,y)=-\log\dotp{\varphi(x)}{\varphi(y)}$ where $\varphi$ is a map from the ground space onto the positive orthant $\RR^r_+$, with $r\ll n$. This choice yields, equivalently, a kernel $k(x,y)=\dotp{\varphi(x)}{\varphi(y)}$, and ensures that the cost of Sinkhorn iterations scales as $O(nr)$. We show that usual cost functions can be approximated using this form. Additionaly, we take advantage of the fact that our approach yields approximation that remain fully differentiable with respect to input distributions, as opposed to previously proposed adaptive low-rank approximations of the kernel matrix, to train a faster variant of OT-GAN \cite{salimans2018improving}.