CLCRNov 7, 2023

PrivLM-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models

arXiv:2311.04044v344 citationsh-index: 14
Originality Synthesis-oriented
AI Analysis

This addresses the problem of inconsistent privacy evaluations for researchers and practitioners, though it is incremental as it builds on existing privacy attack methods.

The paper tackles the challenge of fairly comparing privacy-preserving language models (PPLMs) by introducing PrivLM-Bench, a benchmark that empirically quantifies privacy leakage through multi-faceted objectives and attack evaluations, showing results on three GLUE datasets.

The rapid development of language models (LMs) brings unprecedented accessibility and usage for both models and users. On the one hand, powerful LMs achieve state-of-the-art performance over numerous downstream NLP tasks. On the other hand, more and more attention is paid to unrestricted model accesses that may bring malicious privacy risks of data leakage. To address these issues, many recent works propose privacy-preserving language models (PPLMs) with differential privacy (DP). Unfortunately, different DP implementations make it challenging for a fair comparison among existing PPLMs. In this paper, we present PrivLM-Bench, a multi-perspective privacy evaluation benchmark to empirically and intuitively quantify the privacy leakage of LMs. Instead of only reporting DP parameters, PrivLM-Bench sheds light on the neglected inference data privacy during actual usage. PrivLM-Bench first clearly defines multi-faceted privacy objectives. Then, PrivLM-Bench constructs a unified pipeline to perform private fine-tuning. Lastly, PrivLM-Bench performs existing privacy attacks on LMs with pre-defined privacy objectives as the empirical evaluation results. The empirical attack results are used to fairly and intuitively evaluate the privacy leakage of various PPLMs. We conduct extensive experiments on three datasets of GLUE for mainstream LMs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes