LGAICLMLSep 16, 2019

Global Autoregressive Models for Data-Efficient Sequence Learning

arXiv:1909.07063v21010 citations
Originality Incremental advance
AI Analysis

This addresses data efficiency in sequence learning, particularly for language modeling, but appears incremental as it builds on existing autoregressive methods.

The paper tackles the problem of poor performance of standard autoregressive seq2seq models under small-data conditions by introducing Global Autoregressive Models (GAMs) that combine autoregressive and log-linear components, resulting in a strong perplexity reduction in language modeling experiments.

Standard autoregressive seq2seq models are easily trained by max-likelihood, but tend to show poor results under small-data conditions. We introduce a class of seq2seq models, GAMs (Global Autoregressive Models), which combine an autoregressive component with a log-linear component, allowing the use of global \textit{a priori} features to compensate for lack of data. We train these models in two steps. In the first step, we obtain an \emph{unnormalized} GAM that maximizes the likelihood of the data, but is improper for fast inference or evaluation. In the second step, we use this GAM to train (by distillation) a second autoregressive model that approximates the \emph{normalized} distribution associated with the GAM, and can be used for fast inference and evaluation. Our experiments focus on language modelling under synthetic conditions and show a strong perplexity reduction of using the second autoregressive model over the standard one.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes