CR CLOct 22, 2024

PAPILLON: Privacy Preservation from Internet-based and Local Language Model Ensembles

Li Siyan, Vethavikashini Chithrra Raghuram, Omar Khattab, Julia Hirschberg, Zhou Yu

Georgia TechStanford

arXiv:2410.17127v328.550 citationsh-index: 9Has CodeNAACL

Originality Incremental advance

AI Analysis

This addresses privacy concerns for users of LLMs by enabling safer interactions without fully sacrificing quality, though it is incremental as it leaves a gap compared to proprietary models.

The paper tackles the problem of user privacy leakage when interacting with proprietary LLMs by proposing a multi-stage pipeline that chains API-based and local models, achieving high response quality for 85.5% of queries while limiting privacy leakage to 7.5%.

Users can divulge sensitive information to proprietary LLM providers, raising significant privacy concerns. While open-source models, hosted locally on the user's machine, alleviate some concerns, models that users can host locally are often less capable than proprietary frontier models. Toward preserving user privacy while retaining the best quality, we propose Privacy-Conscious Delegation, a novel task for chaining API-based and local models. We utilize recent public collections of user-LLM interactions to construct a natural benchmark called PUPA, which contains personally identifiable information (PII). To study potential approaches, we devise PAPILLON, a multi-stage LLM pipeline that uses prompt optimization to address a simpler version of our task. Our best pipeline maintains high response quality for 85.5% of user queries while restricting privacy leakage to only 7.5%. We still leave a large margin to the generation quality of proprietary LLMs for future work. Our data and code is available at https://github.com/siyan-sylvia-li/PAPILLON.

View on arXiv PDF Code

Similar