CRAINov 24, 2024

Data Lineage Inference: Uncovering Privacy Vulnerabilities of Dataset Pruning

arXiv:2411.15796v15 citationsh-index: 7
Originality Highly original
AI Analysis

This work addresses privacy risks for machine learning practitioners using dataset pruning, introducing a new data-centric inference paradigm that is foundational rather than incremental.

The paper tackles the problem of data privacy vulnerabilities in dataset pruning, revealing that membership of data in the redundant set can be detected through attacks even before model training, with adversaries achieving accurate identification using limited prior knowledge.

In this work, we systematically explore the data privacy issues of dataset pruning in machine learning systems. Our findings reveal, for the first time, that even if data in the redundant set is solely used before model training, its pruning-phase membership status can still be detected through attacks. Since this is a fully upstream process before model training, traditional model output-based privacy inference methods are completely unsuitable. To address this, we introduce a new task called Data-Centric Membership Inference and propose the first ever data-centric privacy inference paradigm named Data Lineage Inference (DaLI). Under this paradigm, four threshold-based attacks are proposed, named WhoDis, CumDis, ArraDis and SpiDis. We show that even without access to downstream models, adversaries can accurately identify the redundant set with only limited prior knowledge. Furthermore, we find that different pruning methods involve varying levels of privacy leakage, and even the same pruning method can present different privacy risks at different pruning fractions. We conducted an in-depth analysis of these phenomena and introduced a metric called the Brimming score to offer guidance for selecting pruning methods with privacy protection in mind.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes