CLOct 11, 2024

Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures

arXiv:2410.08971v11 citationsh-index: 29
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in sparse transformer architectures for NLP tasks like summarization, offering an incremental improvement.

The paper tackles the challenge of long-range context encoding in sparse transformers by proposing a method to selectively increase global attention using keyword detection, resulting in improved performance on abstractive summarization tasks across zero-shot, few-shot, and fine-tuned cases on several benchmark datasets.

In this paper, we propose an extension to Longformer Encoder-Decoder, a popular sparse transformer architecture. One common challenge with sparse transformers is that they can struggle with encoding of long range context, such as connections between topics discussed at a beginning and end of a document. A method to selectively increase global attention is proposed and demonstrated for abstractive summarization tasks on several benchmark data sets. By prefixing the transcript with additional keywords and encoding global attention on these keywords, improvement in zero-shot, few-shot, and fine-tuned cases is demonstrated for some benchmark data sets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes