LG AI CL CV NEDec 7, 2025

Arc Gradient Descent: A Mathematically Derived Reformulation of Gradient Descent with Phase-Aware, User-Controlled Step Dynamics

Nikhil Verma, Joonas Linnosmaa, Espinosa-Leal Leonardo, Napat Vajragupta

arXiv:2512.06737v1h-index: 15

Originality Highly original

AI Analysis

This work addresses optimization challenges in machine learning by offering a new optimizer that improves convergence and generalization, potentially benefiting researchers and practitioners in training deep neural networks.

The paper tackles the problem of optimizing non-convex functions and deep learning models by introducing ArcGD, a reformulation of gradient descent with phase-aware step dynamics, which outperformed Adam on a challenging Rosenbrock function and achieved the highest average test accuracy of 50.7% on CIFAR-10 across multiple architectures.

The paper presents the formulation, implementation, and evaluation of the ArcGD optimiser. The evaluation is conducted initially on a non-convex benchmark function and subsequently on a real-world ML dataset. The initial comparative study using the Adam optimiser is conducted on a stochastic variant of the highly non-convex and notoriously challenging Rosenbrock function, renowned for its narrow, curved valley, across dimensions ranging from 2D to 1000D and an extreme case of 50,000D. Two configurations were evaluated to eliminate learning-rate bias: (i) both using ArcGD's effective learning rate and (ii) both using Adam's default learning rate. ArcGD consistently outperformed Adam under the first setting and, although slower under the second, achieved super ior final solutions in most cases. In the second evaluation, ArcGD is evaluated against state-of-the-art optimizers (Adam, AdamW, Lion, SGD) on the CIFAR-10 image classification dataset across 8 diverse MLP architectures ranging from 1 to 5 hidden layers. ArcGD achieved the highest average test accuracy (50.7%) at 20,000 iterations, outperforming AdamW (46.6%), Adam (46.8%), SGD (49.6%), and Lion (43.4%), winning or tying on 6 of 8 architectures. Notably, while Adam and AdamW showed strong early convergence at 5,000 iterations, but regressed with extended training, whereas ArcGD continued improving, demonstrating generalization and resistance to overfitting without requiring early stopping tuning. Strong performance on geometric stress tests and standard deep-learning benchmarks indicates broad applicability, highlighting the need for further exploration. Moreover, it is also shown that a variant of ArcGD can be interpreted as a special case of the Lion optimiser, highlighting connections between the inherent mechanisms of such optimisation methods.

View on arXiv PDF

Similar