Upper Bounds for Local Learning Coefficients of Three-Layer Neural Networks

arXiv:2603.127850.8

Predicted impact top 98% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses a theoretical bottleneck in understanding the Bayesian asymptotic behavior of neural networks, providing incremental progress by extending existing methods to singular points and broader activation functions.

The paper tackles the problem of evaluating the learning coefficient (real log canonical threshold) for three-layer neural networks at singular points, deriving an upper-bound formula applicable to general analytic activation functions like swish and polynomials, and shows that in the one-dimensional input case, this bound matches known values, partially resolving prior discrepancies.

Three-layer neural networks are known to form singular learning models, and their Bayesian asymptotic behavior is governed by the learning coefficient, or real log canonical threshold. Although this quantity has been clarified for regular models and for some special singular models, broadly applicable methods for evaluating it in neural networks remain limited. Recently, a formula for the local learning coefficient of semiregular models was proposed, yielding an upper bound on the learning coefficient. However, this formula applies only to nonsingular points in the set of realization parameters and cannot be used at singular points. In particular, for three-layer neural networks, the resulting upper bound has been shown to differ substantially from learning coefficient values already known in some cases. In this paper, we derive an upper-bound formula for the local learning coefficient at singular points in three-layer neural networks. This formula can be interpreted as a counting rule under budget constraints and demand-supply constraints, and is applicable to general analytic activation functions. In particular, it covers the swish function and polynomial functions, extending previous results to a wider class of activation functions. We further show that, when the input dimension is one, the upper bound obtained here coincides with the already known learning coefficient, thereby partially resolving the discrepancy above. Our result also provides a systematic perspective on how the weight parameters of three-layer neural networks affect the learning coefficient.

View on arXiv PDF

Similar