CRCLLGMay 12

Reconstruction of Personally Identifiable Information from Supervised Finetuned Models

arXiv:2605.1226476.9
Predicted impact top 14% in CR · last 90 daysOriginality Incremental advance
AI Analysis

For privacy researchers and LLM practitioners, this work highlights the risk of PII leakage from finetuned models, though it is an initial study with limited scope.

This paper studies PII reconstruction from supervised finetuned LLMs for the first time, constructing multi-turn medical and legal Q&A datasets with PII. The proposed COVA decoding algorithm outperforms existing extraction methods, showing that partial attacker knowledge can significantly improve reconstruction success.

Supervised Finetuning (SFT) has become one of the primary methods for adapting a large language model (LLM) with extensive pre-trained knowledge to domain-specific, instruction-following tasks. SFT datasets, composed of instruction-response pairs, often include user-provided information that may contain sensitive data such as personally identifiable information (PII), raising privacy concerns. This paper studies the problem of PII reconstruction from SFT models for the first time. We construct multi-turn, user-centric Q&A datasets in sensitive domains, specifically medical and legal settings, that incorporate PII to enable realistic evaluation of leakage. Using these datasets, we evaluate the extent to which an adversary, with varying levels of knowledge about the fine-tuning dataset, can infer sensitive information about individuals whose data was used during SFT. In the reconstruction setting, we propose COVA, a novel decoding algorithm to reconstruct PII under prefix-based attacks, consistently outperforming existing extraction methods. Our results show that even partial attacker knowledge can significantly improve reconstruction success, while leakage varies substantially across PII types.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes