CRAICLIRFeb 25, 2025

PII-Bench: Evaluating Query-Aware Privacy Protection Systems

arXiv:2502.18545v15 citationsh-index: 3
AI Analysis

This addresses privacy concerns for users of LLMs by providing a benchmark to evaluate and improve PII protection systems, though it is incremental as it builds on existing privacy protection efforts.

The paper tackles the problem of privacy protection in Large Language Models by proposing a query-unrelated PII masking strategy and introducing PII-Bench, a comprehensive evaluation framework with 2,842 test samples across 55 PII categories, revealing that current models, including state-of-the-art LLMs, struggle with determining PII query relevance, especially in complex multi-subject scenarios.

The widespread adoption of Large Language Models (LLMs) has raised significant privacy concerns regarding the exposure of personally identifiable information (PII) in user prompts. To address this challenge, we propose a query-unrelated PII masking strategy and introduce PII-Bench, the first comprehensive evaluation framework for assessing privacy protection systems. PII-Bench comprises 2,842 test samples across 55 fine-grained PII categories, featuring diverse scenarios from single-subject descriptions to complex multi-party interactions. Each sample is carefully crafted with a user query, context description, and standard answer indicating query-relevant PII. Our empirical evaluation reveals that while current models perform adequately in basic PII detection, they show significant limitations in determining PII query relevance. Even state-of-the-art LLMs struggle with this task, particularly in handling complex multi-subject scenarios, indicating substantial room for improvement in achieving intelligent PII masking.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes