LGAIMLFeb 11, 2022

Support Vectors and Gradient Dynamics of Single-Neuron ReLU Networks

arXiv:2202.05510v21 citations
Originality Incremental advance
AI Analysis

This provides incremental theoretical insights into generalization behavior for ReLU networks, relevant to machine learning researchers.

The paper tackles the problem of characterizing implicit bias in single-neeuron ReLU networks trained with gradient descent, discovering an implicit bias in terms of support vectors and proving global convergence for the 2D case.

Understanding implicit bias of gradient descent for generalization capability of ReLU networks has been an important research topic in machine learning research. Unfortunately, even for a single ReLU neuron trained with the square loss, it was recently shown impossible to characterize the implicit regularization in terms of a norm of model parameters (Vardi & Shamir, 2021). In order to close the gap toward understanding intriguing generalization behavior of ReLU networks, here we examine the gradient flow dynamics in the parameter space when training single-neuron ReLU networks. Specifically, we discover an implicit bias in terms of support vectors, which plays a key role in why and how ReLU networks generalize well. Moreover, we analyze gradient flows with respect to the magnitude of the norm of initialization, and show that the norm of the learned weight strictly increases through the gradient flow. Lastly, we prove the global convergence of single ReLU neuron for $d = 2$ case.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes