LGMLMay 26, 2022

Learning to Reason with Neural Networks: Generalization, Unseen Data and Boolean Measures

Apple
arXiv:2205.13647v218 citationsh-index: 113
Originality Incremental advance
AI Analysis

This provides theoretical insights into the implicit bias of gradient descent for reasoning tasks, but it is incremental as it builds on prior work and focuses on specific benchmarks.

The paper tackles the problem of learning logical functions with gradient descent on neural networks, showing that generalization error can be lower-bounded by noise-stability and characterized by Boolean influence under distribution shift, with experimental support on models like MLPs and Transformers.

This paper considers the Pointer Value Retrieval (PVR) benchmark introduced in [ZRKB21], where a 'reasoning' function acts on a string of digits to produce the label. More generally, the paper considers the learning of logical functions with gradient descent (GD) on neural networks. It is first shown that in order to learn logical functions with gradient descent on symmetric neural networks, the generalization error can be lower-bounded in terms of the noise-stability of the target function, supporting a conjecture made in [ZRKB21]. It is then shown that in the distribution shift setting, when the data withholding corresponds to freezing a single feature (referred to as canonical holdout), the generalization error of gradient descent admits a tight characterization in terms of the Boolean influence for several relevant architectures. This is shown on linear models and supported experimentally on other models such as MLPs and Transformers. In particular, this puts forward the hypothesis that for such architectures and for learning logical functions such as PVR functions, GD tends to have an implicit bias towards low-degree representations, which in turn gives the Boolean influence for the generalization error under quadratic loss.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes