CRLGNov 18, 2024

Preempting Text Sanitization Utility in Resource-Constrained Privacy-Preserving LLM Interactions

arXiv:2411.11521v32 citationsh-index: 16
Originality Incremental advance
AI Analysis

This addresses privacy and cost issues for users and providers of pay-per-use LLM services, though it is incremental as it builds on existing sanitization methods.

The paper tackles the problem of predicting the utility of differentially private sanitized prompts before sending them to LLMs to avoid resource waste, showing that their middleware architecture prevents waste for up to 20% of prompts in summarization and translation tasks.

Interactions with online Large Language Models raise privacy issues where providers can gather sensitive information about users and their companies from the prompts. While textual prompts can be sanitized using Differential Privacy, we show that it is difficult to anticipate the performance of an LLM on such sanitized prompt. Poor performance has clear monetary consequences for LLM services charging on a pay-per-use model as well as great amount of computing resources wasted. To this end, we propose a middleware architecture leveraging a Small Language Model to predict the utility of a given sanitized prompt before it is sent to the LLM. We experimented on a summarization task and a translation task to show that our architecture helps prevent such resource waste for up to 20% of the prompts. During our study, we also reproduced experiments from one of the most cited paper on text sanitization using DP and show that a potential performance-driven implementation choice dramatically changes the output while not being explicitly acknowledged in the paper.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes