CR AIMar 30

Privacy Guard & Token Parsimony by Prompt and Context Handling and LLM Routing

arXiv:2603.2897270.81 citationsh-index: 9

AI Analysis

For organizations using LLMs, this work provides a method to simultaneously reduce costs and eliminate privacy leakage to third-party providers.

The paper introduces a local Privacy Guard using an on-premise SLM for abstractive summarization and automatic prompt optimization, achieving 45% OpEx reduction, 100% redaction of personal secrets, and 85% preference for compressed responses over raw baselines.

The large-scale adoption of Large Language Models (LLMs) forces a trade-off between operational cost (OpEx) and data privacy. Current routing frameworks reduce costs but ignore prompt sensitivity, exposing users and institutions to leakage risks towards third-party cloud providers. We formalise the "Inseparability Paradigm": advanced context management intrinsically coincides with privacy management. We propose a local "Privacy Guard" -- a holistic contextual observer powered by an on-premise Small Language Model (SLM) -- that performs abstractive summarisation and Automatic Prompt Optimisation (APO) to decompose prompts into focused sub-tasks, re-routing high-risk queries to Zero-Trust or NDA-covered models. This dual mechanism simultaneously eliminates sensitive inference vectors (Zero Leakage) and reduces cloud token payloads (OpEx Reduction). A LIFO-based context compacting mechanism further bounds working memory, limiting the emergent leakage surface. We validate the framework through a 2x2 benchmark (Lazy vs. Expert users; Personal vs. Institutional secrets) on a 1,000-sample dataset, achieving a 45% blended OpEx reduction, 100% redaction success on personal secrets, and -- via LLM-as-a-Judge evaluation -- an 85% preference rate for APO-compressed responses over raw baselines. Our results demonstrate that Token Parsimony and Zero Leakage are mathematically dual projections of the same contextual compression operator.

View on arXiv PDF

Similar