LGJun 2, 2025

Stress-Testing ML Pipelines with Adversarial Data Corruption

arXiv:2506.01230v11 citationsh-index: 24Proc VLDB Endow
Originality Highly original
AI Analysis

It addresses the need for rigorous stress-testing of high-stakes ML systems against realistic errors, providing a practical tool for regulators and practitioners.

The paper tackles the problem of realistic data-quality issues degrading ML pipeline reliability by introducing SAVAGE, a framework that models structured corruptions and discovers patterns that severely impact performance, with experiments showing around 5% corruptions cause significant degradation.

Structured data-quality issues, such as missing values correlated with demographics, culturally biased labels, or systemic selection biases, routinely degrade the reliability of machine-learning pipelines. Regulators now increasingly demand evidence that high-stakes systems can withstand these realistic, interdependent errors, yet current robustness evaluations typically use random or overly simplistic corruptions, leaving worst-case scenarios unexplored. We introduce SAVAGE, a causally inspired framework that (i) formally models realistic data-quality issues through dependency graphs and flexible corruption templates, and (ii) systematically discovers corruption patterns that maximally degrade a target performance metric. SAVAGE employs a bi-level optimization approach to efficiently identify vulnerable data subpopulations and fine-tune corruption severity, treating the full ML pipeline, including preprocessing and potentially non-differentiable models, as a black box. Extensive experiments across multiple datasets and ML tasks (data cleaning, fairness-aware learning, uncertainty quantification) demonstrate that even a small fraction (around 5 %) of structured corruptions identified by SAVAGE severely impacts model performance, far exceeding random or manually crafted errors, and invalidating core assumptions of existing techniques. Thus, SAVAGE provides a practical tool for rigorous pipeline stress-testing, a benchmark for evaluating robustness methods, and actionable guidance for designing more resilient data workflows.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes