LGFeb 25, 2022

An initial alignment between neural network and target is needed for gradient descent to learn

arXiv:2202.12846v216 citations
Originality Highly original
AI Analysis

This addresses a foundational theoretical problem in machine learning, showing that architecture design must incorporate knowledge of the target, which is incremental but clarifies a key bottleneck.

The paper tackles the problem of when gradient descent can learn a target function, proving that without sufficient initial alignment between the network and target, learning fails in polynomial time. The result provides a theoretical lower-bound and answers an open problem from prior work.

This paper introduces the notion of ``Initial Alignment'' (INAL) between a neural network at initialization and a target function. It is proved that if a network and a Boolean target function do not have a noticeable INAL, then noisy gradient descent on a fully connected network with normalized i.i.d. initialization will not learn in polynomial time. Thus a certain amount of knowledge about the target (measured by the INAL) is needed in the architecture design. This also provides an answer to an open problem posed in [AS20]. The results are based on deriving lower-bounds for descent algorithms on symmetric neural networks without explicit knowledge of the target function beyond its INAL.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes