MLLGNCFeb 5, 2022

The Implicit Bias of Gradient Descent on Generalized Gated Linear Networks

arXiv:2202.02649v14 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of understanding inductive biases in deep neural networks for researchers aiming to improve algorithm efficiency and robustness, though it is incremental as it builds on existing GLN frameworks.

The authors derived the infinite-time training limit of gated linear networks and generalized the results to networks described by homogeneous polynomials, applying theoretical predictions to GLNs trained on MNIST to show how architectural constraints and implicit bias affect performance.

Understanding the asymptotic behavior of gradient-descent training of deep neural networks is essential for revealing inductive biases and improving network performance. We derive the infinite-time training limit of a mathematically tractable class of deep nonlinear neural networks, gated linear networks (GLNs), and generalize these results to gated networks described by general homogeneous polynomials. We study the implications of our results, focusing first on two-layer GLNs. We then apply our theoretical predictions to GLNs trained on MNIST and show how architectural constraints and the implicit bias of gradient descent affect performance. Finally, we show that our theory captures a substantial portion of the inductive bias of ReLU networks. By making the inductive bias explicit, our framework is poised to inform the development of more efficient, biologically plausible, and robust learning algorithms.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes