LG MLSep 6, 2025

The Measure of Deception: An Analysis of Data Forging in Machine Unlearning

arXiv:2509.05865v14.1h-index: 24

Originality Highly original

AI Analysis

This addresses the challenge of ensuring privacy and integrity in machine learning models for practitioners and regulators, though it is incremental as it builds on existing unlearning frameworks with theoretical analysis.

The paper tackles the problem of verifying machine unlearning by analyzing adversarial data forging, where data is crafted to mimic gradients and create false unlearning claims. It shows that the measure of forging sets scales with tolerance ε, specifically as ε^{(d-r)/2} under mild assumptions, and proves that the likelihood of random forging is vanishingly small, indicating detection is possible.

Motivated by privacy regulations and the need to mitigate the effects of harmful data, machine unlearning seeks to modify trained models so that they effectively ``forget'' designated data. A key challenge in verifying unlearning is forging -- adversarially crafting data that mimics the gradient of a target point, thereby creating the appearance of unlearning without actually removing information. To capture this phenomenon, we consider the collection of data points whose gradients approximate a target gradient within tolerance $ε$ -- which we call an $ε$-forging set -- and develop a framework for its analysis. For linear regression and one-layer neural networks, we show that the Lebesgue measure of this set is small. It scales on the order of $ε$, and when $ε$ is small enough, $ε^d$. More generally, under mild regularity assumptions, we prove that the forging set measure decays as $ε^{(d-r)/2}$, where $d$ is the data dimension and $r<d$ is the nullity of a variation matrix defined by the model gradients. Extensions to batch SGD and almost-everywhere smooth loss functions yield the same asymptotic scaling. In addition, we establish probability bounds showing that, under non-degenerate data distributions, the likelihood of randomly sampling a forging point is vanishingly small. These results provide evidence that adversarial forging is fundamentally limited and that false unlearning claims can, in principle, be detected.

View on arXiv PDF

Similar