CLJul 1, 2024

KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches

Jiayi Yuan, Hongyi Liu, Shaochen Zhong, Yu-Neng Chuang, Songchen Li, Guanchu Wang, Duy Le, Hongye Jin, Vipin Chaudhary, Zhaozhuo Xu, Zirui Liu, Xia Hu

arXiv:2407.01527v220.449 citationsh-index: 15Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for comprehensive benchmarking of efficiency-driven methods to enable long-context capabilities in LLMs, which is crucial for tasks like book summarization and code assistance, but it is incremental as it focuses on evaluation rather than introducing new methods.

The authors tackled the problem of evaluating long-context capable approaches for large language models by benchmarking over 10 state-of-the-art methods across seven task categories, revealing previously unknown phenomena and providing insights for future development.

Long context capability is a crucial competency for large language models (LLMs) as it mitigates the human struggle to digest long-form texts. This capability enables complex task-solving scenarios such as book summarization, code assistance, and many more tasks that are traditionally manpower-intensive. However, transformer-based LLMs face significant challenges with long context input due to the growing size of the KV cache and the intrinsic complexity of attending to extended inputs; where multiple schools of efficiency-driven approaches - such as KV cache quantization, token dropping, prompt compression, linear-time sequence models, and hybrid architectures - have been proposed to produce efficient yet long context-capable models. Despite these advancements, no existing work has comprehensively benchmarked these methods in a reasonably aligned environment. In this work, we fill this gap by providing a taxonomy of current methods and evaluating 10+ state-of-the-art approaches across seven categories of long context tasks. Our work reveals numerous previously unknown phenomena and offers insights - as well as a friendly workbench - for the future development of long context-capable LLMs. The source code is available at https://github.com/henryzhongsc/longctx_bench.

View on arXiv PDF Code

Similar