LG AIMay 28

idSCD: Identifying Training Datasets through Semantic Correlation Descriptors

Andrada Gobeaja, Ionut Hodoroaga, Elena Burceanu, Marius Leordeanu

arXiv:2605.3046266.9h-index: 4

AI Analysis

This work addresses the problem of dataset-level membership inference for model developers and auditors, providing a new white-box approach to determine training data provenance.

The paper proposes a method to identify if a dataset was part of a model's training mixture by analyzing the model's learned semantic correlation structure, which they call Semantic Correlation Descriptors (SCDs). Their SCD-based membership score outperforms existing black-box and white-box baselines, achieving a relative gain exceeding 60% in ROC-AUC in scenarios where dataset groups have distinct semantic particularities.

Can a dataset be recognized from the spurious correlations it induces during training? We argue that datasets leave dataset-specific traces in a model's learned semantic correlation structure: incidental regularities that are predictive within a dataset, but not causal for the underlying task, can be internalized during training. We use this insight to study dataset-level membership inference, moving beyond existing methods that rely on behavioral or distributional evidence such as confidence scores, losses, margins, generated samples, or query responses. We introduce a white-box semantic fingerprinting approach based on semantic correlation descriptors (SCDs), which capture the semantic correlation structure learned by a model and make it comparable across dataset mixtures. In a controlled leave-one-dataset-out diagnostic, SCDs recover dataset-specific changes and perfectly separate matching from non-matching dataset pairs. We then propose a practical SCD-based membership score that tests whether a target dataset is part of a model's training mixture using only the model's SCD and the target dataset's standalone SCD, without requiring leave-one-dataset-out models. Across three diverse experimental settings, with dataset groups for natural language inference, emotion classification, and medical text classification, we test both the advantages and limitations of SCD-based membership inference with different degrees of semantic separation and keyword support between dataset splits. On average, the classifier based on this score achieves the highest performance and the lowest std, outperforming black-box baselines RMIA, Attack-P, and LiRA, as well as the white-box SIF baseline. These results show that dataset membership can be traced through internal semantic correlations, with the largest relative gain exceeding 60% in ROC-AUC when dataset groups expose distinct semantic particularities.

View on arXiv PDF

Similar