LGJun 10, 2025

NysAct: A Scalable Preconditioned Gradient Descent using Nystrom Approximation

arXiv:2506.08360v14.1h-index: 1Has CodeBigData

Originality Incremental advance

AI Analysis

This work addresses optimization efficiency and generalization for machine learning practitioners, offering a practical incremental improvement by balancing computational cost and accuracy.

The paper tackles the trade-off between fast but poorly generalizing first-order gradient methods and accurate but costly second-order methods by introducing NysAct, a scalable first-order gradient preconditioning method that uses a Nystrom approximation to reduce computational and memory costs while achieving improved test accuracy over both types of methods.

Adaptive gradient methods are computationally efficient and converge quickly, but they often suffer from poor generalization. In contrast, second-order methods enhance convergence and generalization but typically incur high computational and memory costs. In this work, we introduce NysAct, a scalable first-order gradient preconditioning method that strikes a balance between state-of-the-art first-order and second-order optimization methods. NysAct leverages an eigenvalue-shifted Nystrom method to approximate the activation covariance matrix, which is used as a preconditioning matrix, significantly reducing time and memory complexities with minimal impact on test accuracy. Our experiments show that NysAct not only achieves improved test accuracy compared to both first-order and second-order methods but also demands considerably less computational resources than existing second-order methods. Code is available at https://github.com/hseung88/nysact.

View on arXiv PDF Code

Similar