LGMLJun 2, 2020

Surprisal-Triggered Conditional Computation with Neural Networks

arXiv:2006.01659v17 citations
Originality Incremental advance
AI Analysis

This work addresses computational efficiency for speech recognition systems, though it is incremental as it builds on existing autoregressive models.

The paper tackles the problem of inefficient computation in neural networks by using surprisal from an autoregressive model to allocate more computation to difficult inputs, achieving performance matching a baseline with 15% fewer FLOPs on speech recognition tasks.

Autoregressive neural network models have been used successfully for sequence generation, feature extraction, and hypothesis scoring. This paper presents yet another use for these models: allocating more computation to more difficult inputs. In our model, an autoregressive model is used both to extract features and to predict observations in a stream of input observations. The surprisal of the input, measured as the negative log-likelihood of the current observation according to the autoregressive model, is used as a measure of input difficulty. This in turn determines whether a small, fast network, or a big, slow network, is used. Experiments on two speech recognition tasks show that our model can match the performance of a baseline in which the big network is always used with 15% fewer FLOPs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes