CLOct 16, 2024

Prompt Compression for Large Language Models: A Survey

Cambridge
arXiv:2410.12388v268 citationsh-index: 19NAACL
Originality Synthesis-oriented
AI Analysis

This is an incremental survey that summarizes and organizes existing research on prompt compression for LLMs, aimed at researchers and practitioners seeking to reduce computational overhead.

This survey tackles the problem of high memory usage and inference costs from long prompts in large language models by reviewing existing prompt compression techniques, categorizing them into hard and soft prompt methods and analyzing their mechanisms and adaptations.

Leveraging large language models (LLMs) for complex natural language tasks typically requires long-form prompts to convey detailed requirements and information, which results in increased memory usage and inference costs. To mitigate these challenges, multiple efficient methods have been proposed, with prompt compression gaining significant research interest. This survey provides an overview of prompt compression techniques, categorized into hard prompt methods and soft prompt methods. First, the technical approaches of these methods are compared, followed by an exploration of various ways to understand their mechanisms, including the perspectives of attention optimization, Parameter-Efficient Fine-Tuning (PEFT), modality integration, and new synthetic language. We also examine the downstream adaptations of various prompt compression techniques. Finally, the limitations of current prompt compression methods are analyzed, and several future directions are outlined, such as optimizing the compression encoder, combining hard and soft prompts methods, and leveraging insights from multimodality.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes