ATACompressor: Adaptive Task-Aware Compression for Efficient Long-Context Processing in LLMs
This addresses the challenge of processing long contexts efficiently in LLMs, offering a scalable solution for tasks like QA, though it appears incremental as it builds on existing compression methods.
The paper tackles the 'lost in the middle' problem in LLMs by proposing ATACompressor, which adaptively compresses long contexts based on task relevance, outperforming existing methods on QA datasets like HotpotQA, MSMARCO, and SQUAD in compression efficiency and task performance.
Long-context inputs in large language models (LLMs) often suffer from the "lost in the middle" problem, where critical information becomes diluted or ignored due to excessive length. Context compression methods aim to address this by reducing input size, but existing approaches struggle with balancing information preservation and compression efficiency. We propose Adaptive Task-Aware Compressor (ATACompressor), which dynamically adjusts compression based on the specific requirements of the task. ATACompressor employs a selective encoder that compresses only the task-relevant portions of long contexts, ensuring that essential information is preserved while reducing unnecessary content. Its adaptive allocation controller perceives the length of relevant content and adjusts the compression rate accordingly, optimizing resource utilization. We evaluate ATACompressor on three QA datasets: HotpotQA, MSMARCO, and SQUAD-showing that it outperforms existing methods in terms of both compression efficiency and task performance. Our approach provides a scalable solution for long-context processing in LLMs. Furthermore, we perform a range of ablation studies and analysis experiments to gain deeper insights into the key components of ATACompressor.