LGDSNEOCMLFeb 4, 2019

Can SGD Learn Recurrent Neural Networks with Provable Generalization?

arXiv:1902.01028v259 citations
AI Analysis

This addresses a foundational gap in PAC learning for RNNs, offering theoretical guarantees that could impact sequential data analysis, though it is incremental in providing specific bounds.

The paper tackles the problem of whether stochastic gradient descent (SGD) can learn recurrent neural networks (RNNs) with provable generalization, showing that vanilla SGD can efficiently learn a notable concept class, including functions where each output token is generated from earlier inputs using a smooth two-layer neural network, with time and sample complexity scaling polynomially in input length.

Recurrent Neural Networks (RNNs) are among the most popular models in sequential data analysis. Yet, in the foundational PAC learning language, what concept class can it learn? Moreover, how can the same recurrent unit simultaneously learn functions from different input tokens to different output tokens, without affecting each other? Existing generalization bounds for RNN scale exponentially with the input length, significantly limiting their practical implications. In this paper, we show using the vanilla stochastic gradient descent (SGD), RNN can actually learn some notable concept class efficiently, meaning that both time and sample complexity scale polynomially in the input length (or almost polynomially, depending on the concept). This concept class at least includes functions where each output token is generated from inputs of earlier tokens using a smooth two-layer neural network.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes