Covariant Gradient Descent
This work addresses the need for more robust and general optimization algorithms in machine learning, though it appears incremental as it builds on and extends existing methods.
The authors tackled the problem of ensuring gradient descent optimization is consistent across different coordinate systems and curved spaces by introducing a covariant formulation, which generalizes and improves upon existing methods like RMSProp, Adam, and AdaBelief.
We present a manifestly covariant formulation of the gradient descent method, ensuring consistency across arbitrary coordinate systems and general curved trainable spaces. The optimization dynamics is defined using a covariant force vector and a covariant metric tensor, both computed from the first and second statistical moments of the gradients. These moments are estimated through time-averaging with an exponential weight function, which preserves linear computational complexity. We show that commonly used optimization methods such as RMSProp, Adam and AdaBelief correspond to special limits of the covariant gradient descent (CGD) and demonstrate how these methods can be further generalized and improved.