Averaging on the Bures-Wasserstein manifold: dimension-free convergence of gradient descent
This provides efficient algorithms for averaging distributions in machine learning, with applications in areas like data fusion and generative modeling, though it is incremental in improving theoretical guarantees.
The paper tackles the problem of computing barycenters of Gaussian distributions using optimal transport metrics, proving that Riemannian gradient descent achieves dimension-free convergence rates, in contrast to previous exponential dependencies.
We study first-order optimization algorithms for computing the barycenter of Gaussian distributions with respect to the optimal transport metric. Although the objective is geodesically non-convex, Riemannian GD empirically converges rapidly, in fact faster than off-the-shelf methods such as Euclidean GD and SDP solvers. This stands in stark contrast to the best-known theoretical results for Riemannian GD, which depend exponentially on the dimension. In this work, we prove new geodesic convexity results which provide stronger control of the iterates, yielding a dimension-free convergence rate. Our techniques also enable the analysis of two related notions of averaging, the entropically-regularized barycenter and the geometric median, providing the first convergence guarantees for Riemannian GD for these problems.