ReadOnce Transformers: Reusable Representations of Text for Transformers
This addresses efficiency issues for NLP practitioners working with multiple tasks on long documents, though it is incremental as it builds on existing transformer models.
The paper tackles the problem of inefficient repeated processing of shared documents across tasks by introducing ReadOnce Transformers, which create reusable, compressed representations of text, resulting in a 2x-5x speedup in training and evaluation.
We present ReadOnce Transformers, an approach to convert a transformer-based model into one that can build an information-capturing, task-independent, and compressed representation of text. The resulting representation is reusable across different examples and tasks, thereby requiring a document shared across many examples or tasks to only be \emph{read once}. This leads to faster training and evaluation of models. Additionally, we extend standard text-to-text transformer models to Representation+Text-to-text models, and evaluate on multiple downstream tasks: multi-hop QA, abstractive QA, and long-document summarization. Our one-time computed representation results in a 2x-5x speedup compared to standard text-to-text models, while the compression also allows existing language models to handle longer documents without the need for designing new pre-trained models.