CLCRLGJun 20, 2024

Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

arXiv:2406.14322v326 citations
Originality Incremental advance
AI Analysis

This addresses privacy concerns for users in applications where sensitive data contributions vary per user, offering a more uniform protection, though it is incremental as it adapts existing DP methods to a new unit.

The paper tackled the problem of uneven privacy guarantees in differentially private fine-tuning of large language models by shifting from example-level to user-level privacy units, and found that user-level DP mechanisms like Group Privacy and User-wise DP-SGD can achieve competitive utility with concrete privacy budgets (e.g., ε=8) on natural language generation tasks.

Large language models (LLMs) have emerged as powerful tools for tackling complex tasks across diverse domains, but they also raise privacy concerns when fine-tuned on sensitive data due to potential memorization. While differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit, current evaluations on LLMs mostly treat each example (text record) as the privacy unit. This leads to uneven user privacy guarantees when contributions per user vary. We therefore study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users. We present a systematic evaluation of user-level DP for LLM fine-tuning on natural language generation tasks. Focusing on two mechanisms for achieving user-level DP guarantees, Group Privacy and User-wise DP-SGD, we investigate design choices like data selection strategies and parameter tuning for the best privacy-utility tradeoff.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes