LGDBDec 18, 2022

AutoSlicer: Scalable Automated Data Slicing for ML Model Analysis

arXiv:2212.09032v14 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses the need for scalable automated data slicing in production ML pipelines for model debugging, comparison, and fairness diagnosis, representing an incremental improvement in efficiency.

The paper tackles the problem of identifying subsets of evaluation data where machine learning models perform anomalously, presenting AutoSlicer, a scalable system that uses distributed metric computation and hypothesis testing to efficiently search for problematic slices, with experiments showing it finds most anomalous slices by inspecting a small portion of the search space.

Automated slicing aims to identify subsets of evaluation data where a trained model performs anomalously. This is an important problem for machine learning pipelines in production since it plays a key role in model debugging and comparison, as well as the diagnosis of fairness issues. Scalability has become a critical requirement for any automated slicing system due to the large search space of possible slices and the growing scale of data. We present Autoslicer, a scalable system that searches for problematic slices through distributed metric computation and hypothesis testing. We develop an efficient strategy that reduces the search space through pruning and prioritization. In the experiments, we show that our search strategy finds most of the anomalous slices by inspecting a small portion of the search space.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes