CLApr 15, 2022

LaMemo: Language Modeling with Look-Ahead Memory

Tsinghua
arXiv:2204.07341v2629 citationsh-index: 74
AI Analysis

This addresses the problem of inefficient long-context modeling in language models for NLP applications, representing an incremental improvement over prior recurrence memory methods.

The paper tackles the challenge of scaling Transformers to long texts in language modeling by proposing LaMemo, a look-ahead memory that enhances recurrence memory with bi-directional attention, achieving superior performance on benchmarks compared to existing memory-based baselines.

Although Transformers with fully connected self-attentions are powerful to model long-term dependencies, they are struggling to scale to long texts with thousands of words in language modeling. One of the solutions is to equip the model with a recurrence memory. However, existing approaches directly reuse hidden states from the previous segment that encodes contexts in a uni-directional way. As a result, this prohibits the memory to dynamically interact with the current context that provides up-to-date information for token prediction. To remedy this issue, we propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens, and interpolating with the old memory states to maintain long-term information in the history. LaMemo embraces bi-directional attention and segment recurrence with an additional computation overhead only linearly proportional to the memory length. Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes