CR LGDec 3, 2023

Mendata: A Framework to Purify Manipulated Training Data

Zonghao Huang, Neil Gong, Michael K. Reiter

arXiv:2312.01281v12.3h-index: 5

Originality Incremental advance

AI Analysis

This addresses security risks for machine learning practitioners by providing a method to mitigate data manipulation attacks, though it appears incremental as it builds on existing purification concepts.

The paper tackles the problem of training models on manipulated data by proposing Mendata, a framework that purifies data by perturbing inputs to match a clean reference distribution, and demonstrates its effectiveness in defeating state-of-the-art data poisoning and tracing techniques.

Untrusted data used to train a model might have been manipulated to endow the learned model with hidden properties that the data contributor might later exploit. Data purification aims to remove such manipulations prior to training the model. We propose Mendata, a novel framework to purify manipulated training data. Starting from a small reference dataset in which a large majority of the inputs are clean, Mendata perturbs the training inputs so that they retain their utility but are distributed similarly (as measured by Wasserstein distance) to the reference data, thereby eliminating hidden properties from the learned model. A key challenge is how to find such perturbations, which we address by formulating a min-max optimization problem and developing a two-step method to iteratively solve it. We demonstrate the effectiveness of Mendata by applying it to defeat state-of-the-art data poisoning and data tracing techniques.

View on arXiv PDF

Similar