LGFeb 26

Takeuchi's Information Criteria as Generalization Measures for DNNs Close to NTK Regime

arXiv:2602.23219v11 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work provides a theoretical and empirical understanding of when a classical information criterion can measure generalization for deep neural networks, which is an incremental step for researchers working on generalization theory and hyperparameter optimization.

This study investigates Takeuchi's information criterion (TIC) as a generalization measure for deep neural networks (DNNs), finding that TIC effectively explains generalization gaps when DNNs operate close to the neural tangent kernel (NTK) regime. The authors trained over 5,000 DNN models across 12 architectures and four datasets, demonstrating a good correlation between estimated TIC values and generalization gaps under NTK-like conditions, and showing that this correlation disappears outside the NTK regime.

Generalization measures have been studied extensively in the machine learning community to better characterize generalization gaps. However, establishing a reliable generalization measure for statistically singular models such as deep neural networks (DNNs) is difficult due to their complex nature. This study focuses on Takeuchi's information criterion (TIC) to investigate the conditions under which this classical measure can effectively explain the generalization gaps of DNNs. Importantly, the developed theory indicates the applicability of TIC near the neural tangent kernel (NTK) regime. In a series of experiments, we trained more than 5,000 DNN models with 12 architectures, including large models (e.g., VGG-16), on four datasets, and estimated the corresponding TIC values to examine the relationship between the generalization gap and the TIC estimates. We applied several TIC approximation methods with feasible computational costs and assessed the accuracy trade-off. Our experimental results indicate that the estimated TIC values correlate well with the generalization gap under conditions close to the NTK regime. However, we show both theoretically and empirically that outside the NTK regime such correlation disappears. Finally, we demonstrate that TIC provides better trial pruning ability than existing methods for hyperparameter optimization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes