CLOct 24, 2023

TRAMS: Training-free Memory Selection for Long-range Language Modeling

arXiv:2310.15494v3133 citationsh-index: 16
Originality Incremental advance
AI Analysis

This addresses inefficiencies in long-range dependencies for language models, though it appears incremental as it builds on existing architectures like Transformer-XL.

The paper tackles the problem of ineffective memories in long-range language modeling by proposing TRAMS, a plug-and-play strategy that selects tokens based on a simple metric to improve attention scores, resulting in performance improvements on benchmarks like WikiText-103 and enwik8 without extra training or parameters.

The Transformer architecture is crucial for numerous AI models, but it still faces challenges in long-range language modeling. Though several specific transformer architectures have been designed to tackle issues of long-range dependencies, existing methods like Transformer-XL are plagued by a high percentage of ineffective memories. In this study, we present a plug-and-play strategy, known as TRAining-free Memory Selection (TRAMS), that selects tokens participating in attention calculation based on one simple metric. This strategy allows us to keep tokens that are likely to have a high attention score with the current queries and ignore the other ones. We have tested our approach on the word-level benchmark (WikiText-103) and the character-level benchmark (enwik8), and the results indicate an improvement without having additional training or adding additional parameters.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes