AIOct 20, 2025

CompactPrompt: A Unified Pipeline for Prompt Data Compression in LLM Workflows

Joong Ho Choi, Jiayang Zhao, Jeel Shah, Ritvika Sonawane, Vedant Singh, Avani Appalla, Will Flanagan, Filipe Condessa

arXiv:2510.18043v17.82 citationsh-index: 1

Originality Incremental advance

AI Analysis

This addresses cost reduction for users of LLM workflows, but it is incremental as it builds on existing compression techniques.

The paper tackles the high run-time costs of LLMs in agentic workflows by introducing CompactPrompt, a pipeline that compresses prompts and data, reducing token usage and inference cost by up to 60% on benchmarks like TAT-QA and FinQA while preserving output quality with less than a 5% accuracy drop.

Large Language Models (LLMs) deliver powerful reasoning and generation capabilities but incur substantial run-time costs when operating in agentic workflows that chain together lengthy prompts and process rich data streams. We introduce CompactPrompt, an end-to-end pipeline that merges hard prompt compression with lightweight file-level data compression. CompactPrompt first prunes low-information tokens from prompts using self-information scoring and dependency-based phrase grouping. In parallel, it applies n-gram abbreviation to recurrent textual patterns in attached documents and uniform quantization to numerical columns, yielding compact yet semantically faithful representations. Integrated into standard LLM agents, CompactPrompt reduces total token usage and inference cost by up to 60% on benchmark dataset like TAT-QA and FinQA, while preserving output quality (Results in less than 5% accuracy drop for Claude-3.5-Sonnet, and GPT-4.1-Mini) CompactPrompt helps visualize real-time compression decisions and quantify cost-performance trade-offs, laying the groundwork for leaner generative AI pipelines.

View on arXiv PDF

Similar