LGFeb 28, 2025

Information-Theoretic Perspectives on Optimizers

arXiv:2502.20763v1h-index: 17
Originality Incremental advance
AI Analysis

This work addresses the complex interplay between optimizers and architectures in neural networks, offering incremental insights through information-theoretic analysis.

The authors tackled the problem of understanding why certain optimizers perform better on specific neural network architectures by introducing information-theoretic metrics called entropy gap, which they found affects optimization dynamics and generalization alongside traditional sharpness metrics. They applied these tools to analyze and improve the Lion optimizer.

The interplay of optimizers and architectures in neural networks is complicated and hard to understand why some optimizers work better on some specific architectures. In this paper, we find that the traditionally used sharpness metric does not fully explain the intricate interplay and introduces information-theoretic metrics called entropy gap to better help analyze. It is found that both sharpness and entropy gap affect the performance, including the optimization dynamic and generalization. We further use information-theoretic tools to understand a recently proposed optimizer called Lion and find ways to improve it.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes