CL CRMar 5, 2025

Token-Level Privacy in Large Language Models

arXiv:2503.03652v12.7h-index: 25

Originality Highly original

AI Analysis

This addresses privacy concerns for users of remote language model services in high-risk applications, representing a novel method rather than an incremental improvement.

The paper tackles the problem of privacy risks when transmitting private information to external language model services by introducing dchi-stencil, a token-level privacy-preserving mechanism that integrates contextual and semantic information under the dchi differential privacy framework, achieving 2epsilon-dchi-privacy and showing comparable or better utility-privacy trade-offs in evaluations.

The use of language models as remote services requires transmitting private information to external providers, raising significant privacy concerns. This process not only risks exposing sensitive data to untrusted service providers but also leaves it vulnerable to interception by eavesdroppers. Existing privacy-preserving methods for natural language processing (NLP) interactions primarily rely on semantic similarity, overlooking the role of contextual information. In this work, we introduce dchi-stencil, a novel token-level privacy-preserving mechanism that integrates contextual and semantic information while ensuring strong privacy guarantees under the dchi differential privacy framework, achieving 2epsilon-dchi-privacy. By incorporating both semantic and contextual nuances, dchi-stencil achieves a robust balance between privacy and utility. We evaluate dchi-stencil using state-of-the-art language models and diverse datasets, achieving comparable and even better trade-off between utility and privacy compared to existing methods. This work highlights the potential of dchi-stencil to set a new standard for privacy-preserving NLP in modern, high-risk applications.

View on arXiv PDF

Similar