LG NA OCNov 9, 2025

DyKAF: Dynamical Kronecker Approximation of the Fisher Information Matrix for Gradient Preconditioning

Nikolay Yudin, Ekaterina Grishina, Andrey Veprikov, Alexandr Beznosikov, Maxim Rakhuba

arXiv:2511.06477v1h-index: 6

Originality Incremental advance

AI Analysis

This addresses the problem of resource-intensive preconditioner construction for machine learning practitioners, though it appears incremental as it builds on existing Kronecker-factorized approaches.

The paper tackles the challenge of efficiently constructing accurate Kronecker-factorized approximations of the Fisher information matrix for gradient preconditioning in optimizers, introducing DyKAF which uses projector-splitting integrators and demonstrates improved performance in large language model pre-training and fine-tuning.

Recently, optimizers that explicitly treat weights as matrices, rather than flattened vectors, have demonstrated their effectiveness. This perspective naturally leads to structured approximations of the Fisher matrix as preconditioners, where the matrix view induces a Kronecker-factorized form that enables memory-efficient representation. However, constructing such approximations both efficiently and accurately remains an open challenge, since obtaining the optimal factorization is resource-intensive and practical methods therefore rely on heuristic design choices. In this work, we introduce a novel approach that leverages projector-splitting integrators to construct effective preconditioners. Our optimizer, DyKAF (Dynamical Kronecker Approximation of the Fisher Matrix), consistently improves the Fisher matrix approximation quality. Experiments on large language model pre-training and fine-tuning demonstrate that DyKAF outperforms existing optimizers across a range of evaluation metrics.

View on arXiv PDF

Similar