LGNAOCOct 9, 2020

Adaptive and Momentum Methods on Manifolds Through Trivializations

arXiv:2010.04617v18 citations
Originality Incremental advance
AI Analysis

This work addresses a foundational challenge in optimization on manifolds for researchers and practitioners in machine learning and related fields, though it is incremental as it builds on existing methods by adapting them to a broader context.

The authors tackled the problem of generalizing adaptive and momentum optimization methods to arbitrary manifolds, where direct generalization is hindered by invariance and efficiency issues due to curvature, by introducing a framework that uses trivializations to cover almost all of the manifold, resulting in algorithms that close the numerical gap and achieve efficiency with just 5 matrix multiplications on GPUs.

Adaptive methods do not have a direct generalization to manifolds as the adaptive term is not invariant. Momentum methods on manifolds suffer from efficiency problems stemming from the curvature of the manifold. We introduce a framework to generalize adaptive and momentum methods to arbitrary manifolds by noting that for every differentiable manifold, there exists a radially convex open set that covers almost all the manifold. Being radially convex, this set is diffeomorphic to $\mathbb{R}^n$. This gives a natural generalization of any adaptive and momentum-based algorithm to a set that covers almost all the manifold in an arbitrary manifolds. We also show how to extend these methods to the context of gradient descent methods with a retraction. For its implementation, we bring an approximation to the exponential of matrices that needs just of 5 matrix multiplications, making it particularly efficient on GPUs. In practice, we see that this family of algorithms closes the numerical gap created by an incorrect use of momentum and adaptive methods on manifolds. At the same time, we see that the most efficient algorithm of this family is given by simply pulling back the problem to the tangent space at the initial point via the exponential map.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes