LGOCMLOct 29, 2023

Implicit Bias of Gradient Descent for Two-layer ReLU and Leaky ReLU Networks on Nearly-orthogonal Data

arXiv:2310.18935v124 citationsh-index: 9
Originality Incremental advance
AI Analysis

This provides theoretical insights into generalization in deep learning for researchers, though it is incremental as it extends known implicit bias results to non-smooth networks.

The paper tackles the open question of implicit bias in non-smooth neural networks trained by gradient descent, showing that for two-layer ReLU and leaky ReLU networks on nearly-orthogonal data, gradient descent finds networks with stable ranks converging to 1 for leaky ReLU and bounded by a constant for ReLU, and achieves equal normalized margins for training data.

The implicit bias towards solutions with favorable properties is believed to be a key reason why neural networks trained by gradient-based optimization can generalize well. While the implicit bias of gradient flow has been widely studied for homogeneous neural networks (including ReLU and leaky ReLU networks), the implicit bias of gradient descent is currently only understood for smooth neural networks. Therefore, implicit bias in non-smooth neural networks trained by gradient descent remains an open question. In this paper, we aim to answer this question by studying the implicit bias of gradient descent for training two-layer fully connected (leaky) ReLU neural networks. We showed that when the training data are nearly-orthogonal, for leaky ReLU activation function, gradient descent will find a network with a stable rank that converges to $1$, whereas for ReLU activation function, gradient descent will find a neural network with a stable rank that is upper bounded by a constant. Additionally, we show that gradient descent will find a neural network such that all the training data points have the same normalized margin asymptotically. Experiments on both synthetic and real data backup our theoretical findings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes