CL AIFeb 5, 2024

UniMem: Towards a Unified View of Long-Context Large Language Models

Junjie Fang, Likai Tang, Hongzhe Bi, Yujia Qin, Si Sun, Zhenyu Li, Haolun Li, Yongjian Li, Xin Cong, Yankai Lin, Yukun Yan, Xiaodong Shi

Tencent

arXiv:2402.03009v24.84 citationsh-index: 41Has Code

Originality Incremental advance

AI Analysis

This work addresses the isolated development of long-context methods for LLMs, offering a unified analysis to guide researchers, though it is incremental in integrating existing techniques.

The paper tackles the lack of systematic integration in long-context processing methods for large language models by introducing UniMem, a unified framework that reformulates existing approaches as memory augmentation, and proposes UniMix, which achieves superior performance with significantly lower perplexity than baselines.

Long-context processing is a critical ability that constrains the applicability of large language models (LLMs). Although there exist various methods devoted to enhancing the long-context processing ability of LLMs, they are developed in an isolated manner and lack systematic analysis and integration of their strengths, hindering further developments. In this paper, we introduce UniMem, a Unified framework that reformulates existing long-context methods from the view of Memory augmentation of LLMs. Distinguished by its four core dimensions-Memory Management, Memory Writing, Memory Reading, and Memory Injection, UniMem empowers researchers to conduct systematic exploration of long-context methods. We re-formulate 16 existing methods based on UniMem and analyze four representative methods: Transformer-XL, Memorizing Transformer, RMT, and Longformer into equivalent UniMem forms to reveal their design principles and strengths. Based on these analyses, we propose UniMix, an innovative approach that integrates the strengths of these algorithms. Experimental results show that UniMix achieves superior performance in handling long contexts with significantly lower perplexity than baselines.

View on arXiv PDF Code

Similar