LGNov 12, 2025

Abstract Gradient Training: A Unified Certification Framework for Data Poisoning, Unlearning, and Differential Privacy

Philip Sosnin, Matthew Wicker, Josh Collyer, Calvin Tsay

arXiv:2511.09400v113.04 citationsh-index: 3

Originality Highly original

AI Analysis

This work addresses the need for formal certification in under-explored areas of training data security and privacy, offering a foundational approach that could impact all of ML/AI by enhancing model trustworthiness.

The paper tackles the problem of certifying machine learning models against training data perturbations, such as data poisoning, unlearning, and differential privacy, by introducing Abstract Gradient Training (AGT), a unified framework that provides provable parameter-space bounds to analyze model behavior under these perturbations.

The impact of inference-time data perturbation (e.g., adversarial attacks) has been extensively studied in machine learning, leading to well-established certification techniques for adversarial robustness. In contrast, certifying models against training data perturbations remains a relatively under-explored area. These perturbations can arise in three critical contexts: adversarial data poisoning, where an adversary manipulates training samples to corrupt model performance; machine unlearning, which requires certifying model behavior under the removal of specific training data; and differential privacy, where guarantees must be given with respect to substituting individual data points. This work introduces Abstract Gradient Training (AGT), a unified framework for certifying robustness of a given model and training procedure to training data perturbations, including bounded perturbations, the removal of data points, and the addition of new samples. By bounding the reachable set of parameters, i.e., establishing provable parameter-space bounds, AGT provides a formal approach to analyzing the behavior of models trained via first-order optimization methods.

View on arXiv PDF

Similar