Can We Infer Confidential Properties of Training Data from LLMs?
This addresses a security and privacy problem for users of LLMs in sensitive domains like healthcare, finance, and law, and is incremental as it extends prior work on property inference attacks to LLMs.
The paper tackled the problem of whether confidential properties of training data can be inferred from large language models (LLMs), and found that their proposed attacks successfully revealed such vulnerabilities across multiple models.
Large language models (LLMs) are increasingly fine-tuned on domain-specific datasets to support applications in fields such as healthcare, finance, and law. These fine-tuning datasets often have sensitive and confidential dataset-level properties -- such as patient demographics or disease prevalence -- that are not intended to be revealed. While prior work has studied property inference attacks on discriminative models (e.g., image classification models) and generative models (e.g., GANs for image data), it remains unclear if such attacks transfer to LLMs. In this work, we introduce PropInfer, a benchmark task for evaluating property inference in LLMs under two fine-tuning paradigms: question-answering and chat-completion. Built on the ChatDoctor dataset, our benchmark includes a range of property types and task configurations. We further propose two tailored attacks: a prompt-based generation attack and a shadow-model attack leveraging word frequency signals. Empirical evaluations across multiple pretrained LLMs show the success of our attacks, revealing a previously unrecognized vulnerability in LLMs.