CLAIApr 24, 2025

An Empirical Study on Prompt Compression for Large Language Models

arXiv:2505.00019v110 citationsh-index: 6Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses efficiency issues for users of LLMs by providing an empirical analysis of compression methods, but it is incremental as it compares existing techniques rather than introducing a new paradigm.

The paper tackles the problem of lengthy prompts increasing computational costs for Large Language Models by studying six prompt compression methods to reduce prompt length while maintaining response quality, finding that moderate compression enhances performance in long contexts, with evaluations across 13 datasets including Longbench.

Prompt engineering enables Large Language Models (LLMs) to perform a variety of tasks. However, lengthy prompts significantly increase computational complexity and economic costs. To address this issue, we study six prompt compression methods for LLMs, aiming to reduce prompt length while maintaining LLM response quality. In this paper, we present a comprehensive analysis covering aspects such as generation performance, model hallucinations, efficacy in multimodal tasks, word omission analysis, and more. We evaluate these methods across 13 datasets, including news, scientific articles, commonsense QA, math QA, long-context QA, and VQA datasets. Our experiments reveal that prompt compression has a greater impact on LLM performance in long contexts compared to short ones. In the Longbench evaluation, moderate compression even enhances LLM performance. Our code and data is available at https://github.com/3DAgentWorld/Toolkit-for-Prompt-Compression.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes