LGMLFeb 8, 2024

Trade-Offs of Diagonal Fisher Information Matrix Estimators

arXiv:2402.05379v38 citationsh-index: 13NIPS
Originality Synthesis-oriented
AI Analysis

This work addresses computational bottlenecks for practitioners in machine learning who use Fisher information for network analysis, but it is incremental as it focuses on trade-offs between existing estimators.

The paper tackles the problem of efficiently estimating the diagonal entries of the Fisher information matrix in neural networks, which is computationally expensive, by analyzing two popular random estimators and deriving variance bounds, finding that variance depends on parameter non-linearity and should not be ignored.

The Fisher information matrix can be used to characterize the local geometry of the parameter space of neural networks. It elucidates insightful theories and useful tools to understand and optimize neural networks. Given its high computational cost, practitioners often use random estimators and evaluate only the diagonal entries. We examine two popular estimators whose accuracy and sample complexity depend on their associated variances. We derive bounds of the variances and instantiate them in neural networks for regression and classification. We navigate trade-offs for both estimators based on analytical and numerical studies. We find that the variance quantities depend on the non-linearity wrt different parameter groups and should not be neglected when estimating the Fisher information.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes