CLAIAug 30, 2024

MemLong: Memory-Augmented Retrieval for Long Text Modeling

arXiv:2408.16967v115 citationsh-index: 11Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of handling long texts for LLM users, offering a significant improvement in context length and performance, though it is an incremental advancement building on existing retrieval-augmented approaches.

The paper tackles the challenge of long-context modeling in LLMs by introducing MemLong, a memory-augmented retrieval method that extends context length from 4k to 80k on a single GPU and outperforms state-of-the-art models on benchmarks.

Recent advancements in Large Language Models (LLMs) have yielded remarkable success across diverse fields. However, handling long contexts remains a significant challenge for LLMs due to the quadratic time and space complexity of attention mechanisms and the growing memory consumption of the key-value cache during generation. This work introduces MemLong: Memory-Augmented Retrieval for Long Text Generation, a method designed to enhance the capabilities of long-context language modeling by utilizing an external retriever for historical information retrieval. MemLong combines a non-differentiable ``ret-mem'' module with a partially trainable decoder-only language model and introduces a fine-grained, controllable retrieval attention mechanism that leverages semantic-level relevant chunks. Comprehensive evaluations on multiple long-context language modeling benchmarks demonstrate that MemLong consistently outperforms other state-of-the-art LLMs. More importantly, MemLong can extend the context length on a single 3090 GPU from 4k up to 80k. Our code is available at https://github.com/Bui1dMySea/MemLong

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes