CLAILGMay 20, 2023

Autoregressive Modeling with Lookahead Attention

arXiv:2305.12272v17 citations
Originality Incremental advance
AI Analysis

This work addresses a potential enhancement for autoregressive modeling in NLP and symbolic tasks, though it is incremental with mixed evidence of actual lookahead utilization.

The authors tackled the problem of improving autoregressive models by incorporating hypothetical future continuations into next-token prediction, resulting in performance gains on tasks like morphological inflection and Boolean satisfiability compared to standard Transformers of similar size.

To predict the next token, autoregressive models ordinarily examine the past. Could they also benefit from also examining hypothetical futures? We consider a novel Transformer-based autoregressive architecture that estimates the next-token distribution by extrapolating multiple continuations of the past, according to some proposal distribution, and attending to these extended strings. This architecture draws insights from classical AI systems such as board game players: when making a local decision, a policy may benefit from exploring possible future trajectories and analyzing them. On multiple tasks including morphological inflection and Boolean satisfiability, our lookahead model is able to outperform the ordinary Transformer model of comparable size. However, on some tasks, it appears to be benefiting from the extra computation without actually using the lookahead information. We discuss possible variant architectures as well as future speedups.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes