CLAILGAug 1, 2022

Efficient Long-Text Understanding with Short-Text Models

DeepMind
arXiv:2208.00748v3262 citationsh-index: 59
Originality Incremental advance
AI Analysis

This addresses the computational bottleneck of long-text understanding for NLP practitioners, offering a practical alternative without expensive pretraining.

The paper tackles the problem of applying transformer-based language models to long sequences by proposing SLED, a method that reuses existing short-text models through chunking and fusion-in-decoder. The result shows SLED is competitive with specialized models up to 50x larger on the SCROLLS benchmark.

Transformer-based pretrained language models (LMs) are ubiquitous across natural language understanding, but cannot be applied to long sequences such as stories, scientific articles and long documents, due to their quadratic complexity. While a myriad of efficient transformer variants have been proposed, they are typically based on custom implementations that require expensive pretraining from scratch. In this work, we propose SLED: SLiding-Encoder and Decoder, a simple approach for processing long sequences that re-uses and leverages battle-tested short-text pretrained LMs. Specifically, we partition the input into overlapping chunks, encode each with a short-text LM encoder and use the pretrained decoder to fuse information across chunks (fusion-in-decoder). We illustrate through controlled experiments that SLED offers a viable strategy for long text understanding and evaluate our approach on SCROLLS, a benchmark with seven datasets across a wide range of language understanding tasks. We find that SLED is competitive with specialized models that are up to 50x larger and require a dedicated and expensive pretraining step.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes