LGDec 19, 2023

On the Parameterization of Second-Order Optimization Effective Towards the Infinite Width

arXiv:2312.12226v214.311 citationsh-index: 3ICLR

Originality Incremental advance

AI Analysis

This work addresses the problem of training larger-scale models efficiently for researchers and practitioners in machine learning, though it is incremental as it builds on existing second-order optimization methods like K-FAC and Shampoo.

The study tackled the challenge of scaling second-order optimization for deep neural networks by identifying a specific parameterization that promotes stable feature learning as network width increases, achieving higher generalization performance and enabling hyperparameter transfer across models with different widths.

Second-order optimization has been developed to accelerate the training of deep neural networks and it is being applied to increasingly larger-scale models. In this study, towards training on further larger scales, we identify a specific parameterization for second-order optimization that promotes feature learning in a stable manner even if the network width increases significantly. Inspired by a maximal update parameterization, we consider a one-step update of the gradient and reveal the appropriate scales of hyperparameters including random initialization, learning rates, and damping terms. Our approach covers two major second-order optimization algorithms, K-FAC and Shampoo, and we demonstrate that our parameterization achieves higher generalization performance in feature learning. In particular, it enables us to transfer the hyperparameters across models with different widths.

View on arXiv PDF

Similar