CLMay 2, 2023

Unlimiformer: Long-Range Transformers with Unlimited Length Input

arXiv:2305.01625v3169 citationsHas Code
Originality Highly original
AI Analysis

This addresses the challenge of handling unlimited-length inputs for long-document and book-summarization tasks, offering a practical solution for researchers and practitioners working with large-scale text data.

The authors tackled the problem of transformers being limited to bounded input lengths by proposing Unlimiformer, a method that uses a k-nearest-neighbor index to offload cross-attention computation, enabling processing of up to 500k token-long inputs without truncation and improving models like BART and Longformer without additional learned weights.

Since the proposal of transformers, these models have been limited to bounded input lengths, because of their need to attend to every token in the input. In this work, we propose Unlimiformer: a general approach that wraps any existing pretrained encoder-decoder transformer, and offloads the cross-attention computation to a single k-nearest-neighbor (kNN) index, while the returned kNN distances are the attention dot-product scores. This kNN index can be kept on either the GPU or CPU memory and queried in sub-linear time; this way, we can index practically unlimited input sequences, while every attention head in every decoder layer retrieves its top-k keys, instead of attending to every key. We evaluate Unlimiformer on several long-document and book-summarization benchmarks, showing that it can process even 500k token-long inputs from the BookSum dataset, without any input truncation at test time. We demonstrate that Unlimiformer improves pretrained models such as BART and Longformer by extending them to unlimited inputs without additional learned weights and without modifying their code. We make our code and models publicly available at https://github.com/abertsch72/unlimiformer .

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes