LGMLNov 13, 2019

Compressive Transformers for Long-Range Sequence Modelling

arXiv:1911.05507v1865 citations
Originality Incremental advance
AI Analysis

It addresses the problem of efficient long-range sequence learning for applications like language modeling and speech processing, though it is incremental in building on existing attentive models.

The paper tackles long-range sequence modeling by introducing the Compressive Transformer, which compresses past memories, achieving state-of-the-art results with 17.1 ppl on WikiText-103 and 0.97 bpc on Enwik8.

We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Compressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17.1 ppl and 0.97 bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory mechanism for RL, demonstrated on an object matching task. To promote the domain of long-range sequence learning, we propose a new open-vocabulary language modelling benchmark derived from books, PG-19.

Code Implementations6 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes