CL AIAug 16, 2025

SCOPE: A Generative Approach for LLM Prompt Compression

Tinghui Zhang, Yifan Wang, Daisy Zhe Wang

arXiv:2508.15813v14 citationsh-index: 3

Originality Incremental advance

AI Analysis

This addresses efficiency and cost issues for LLM users by improving prompt compression, though it is incremental as it builds on existing methods.

The paper tackles the problem of prompt compression for LLMs by introducing a generative approach that splits prompts into chunks and rewrites them concisely, achieving significantly better compression quality and higher stability than state-of-the-art methods, especially at high compression ratios.

Prompt compression methods enhance the efficiency of Large Language Models (LLMs) and minimize the cost by reducing the length of input context. The goal of prompt compression is to shorten the LLM prompt while maintaining a high generation quality. However, existing solutions, mainly based on token removal, face challenges such as information loss and structural incoherence, like missing grammar elements in a sentence, or incomplete word phrases after token removal. Such challenges limit the final generation quality of LLM. To overcome these limitations, we present a novel generative prompt compression method. Unlike the existing token removal methods, our method centers at a chunking-and-summarization mechanism. Specifically, our method splits prompt into semantically coherent chunks and rewrites the chunks to be more concise. The chunks are reconstructed into meaningful prompt finally. We design several optimization techniques for the mechanism, including optimized semantic chunking, outlier chunk handling, dynamic compression ratio, compression prioritization, and keyword maintaining. These techniques effectively improve the identifying and preserving of critical information and coherence among texts, as well as providing finer grind control of the compression ratio. We conduct extensive evaluation on question-answering and summarization tasks, with datasets covering multiple different domain. The evaluation shows our method achieves a significantly better compression quality, and higher stability than the state-of-the-art methods, especially under high compression ratio, which proves the effectiveness and practicality of our method.

View on arXiv PDF

Similar