LGCLMLOct 22, 2020

Limitations of Autoregressive Models and Their Alternatives

arXiv:2010.11939v3748 citations
Originality Incremental advance
AI Analysis

This highlights a core theoretical limitation for NLP practitioners relying on autoregressive models, indicating that scaling alone is insufficient.

The paper identifies that standard autoregressive language models are fundamentally limited in modeling distributions with hard-to-compute next-symbol probabilities, even with extensive training, and suggests alternatives like energy-based and latent-variable models to overcome these limitations.

Standard autoregressive language models perform only polynomial-time computation to compute the probability of the next symbol. While this is attractive, it means they cannot model distributions whose next-symbol probability is hard to compute. Indeed, they cannot even model them well enough to solve associated easy decision problems for which an engineer might want to consult a language model. These limitations apply no matter how much computation and data are used to train the model, unless the model is given access to oracle parameters that grow superpolynomially in sequence length. Thus, simply training larger autoregressive language models is not a panacea for NLP. Alternatives include energy-based models (which give up efficient sampling) and latent-variable autoregressive models (which give up efficient scoring of a given string). Both are powerful enough to escape the above limitations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes