CLLGJan 31, 2024

LOCOST: State-Space Models for Long Document Abstractive Summarization

arXiv:2401.17919v3106 citationsh-index: 11EACL
Originality Highly original
AI Analysis

This addresses the challenge of processing long documents efficiently for summarization tasks, offering a low-complexity alternative to transformers with significant memory savings.

The paper tackled the problem of long document abstractive summarization by proposing LOCOST, a state-space model architecture that achieves 93-96% of the performance of top sparse transformers while saving up to 50% memory in training and 87% in inference, and handles inputs over 600K tokens.

State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of $O(L \log L)$, this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns. We evaluate our model on a series of long document abstractive summarization tasks. The model reaches a performance level that is 93-96% comparable to the top-performing sparse transformers of the same size while saving up to 50% memory during training and up to 87% during inference. Additionally, LOCOST effectively handles input texts exceeding 600K tokens at inference time, setting new state-of-the-art results on full-book summarization and opening new perspectives for long input processing.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes