LGAIAug 28, 2025

Dynamic Low-rank Approximation of Full-Matrix Preconditioner for Training Generalized Linear Models

arXiv:2508.21106v11 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses the scalability problem for training large generalized linear models, though it is an incremental improvement over existing adaptive optimization methods.

The paper tackles the computational inefficiency of full-matrix adaptive gradient methods by proposing AdaGram, which uses low-rank approximations to reduce memory and computational costs while maintaining performance. Experiments show it converges faster or matches diagonal adaptive optimizers with rank-five approximations.

Adaptive gradient methods like Adagrad and its variants are widespread in large-scale optimization. However, their use of diagonal preconditioning matrices limits the ability to capture parameter correlations. Full-matrix adaptive methods, approximating the exact Hessian, can model these correlations and may enable faster convergence. At the same time, their computational and memory costs are often prohibitive for large-scale models. To address this limitation, we propose AdaGram, an optimizer that enables efficient full-matrix adaptive gradient updates. To reduce memory and computational overhead, we utilize fast symmetric factorization for computing the preconditioned update direction at each iteration. Additionally, we maintain the low-rank structure of a preconditioner along the optimization trajectory using matrix integrator methods. Numerical experiments on standard machine learning tasks show that AdaGram converges faster or matches the performance of diagonal adaptive optimizers when using rank five and smaller rank approximations. This demonstrates AdaGram's potential as a scalable solution for adaptive optimization in large models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes