CLLGMay 9, 2024

HMT: Hierarchical Memory Transformer for Efficient Long Context Language Processing

arXiv:2405.06067v321 citationsHas CodeNAACL
Originality Highly original
AI Analysis

This addresses the challenge of efficient long-context language processing for AI applications, offering a novel memory architecture that reduces computational costs.

The paper tackles the problem of memory constraints in Transformer-based large language models by proposing the Hierarchical Memory Transformer (HMT), which imitates human memorization to improve long-context processing, achieving comparable or superior generation quality with 2 to 57 times fewer parameters and 2.5 to 116 times less inference memory.

Transformer-based large language models (LLM) have been widely used in language processing applications. However, due to the memory constraints of the devices, most of them restrict the context window. Even though recurrent models in previous works can memorize past tokens to enable unlimited context and maintain effectiveness, they have ``flat'' memory architectures. Such architectures have limitations in selecting and filtering information. Since humans are good at learning and self-adjustment, we believe that imitating brain memory hierarchy is beneficial for model memorization. Thus, we propose the Hierarchical Memory Transformer (HMT), a novel framework that facilitates a model's long-context processing ability by imitating human memorization behavior. Leveraging memory-augmented segment-level recurrence, we organize the memory hierarchy by preserving tokens from early input segments, passing memory embeddings along the sequence, and recalling relevant information from history. Evaluating general language modeling, question-answering tasks, and the summarization task, we show that HMT consistently improves the long-context processing ability of existing models. Furthermore, HMT achieves a comparable or superior generation quality to long-context LLMs with $2 \sim 57\times$ fewer parameters and $2.5 \sim 116\times$ less inference memory, significantly outperforming previous memory-augmented models. Code on Github: https://github.com/OswaldHe/HMT-pytorch.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes