LGJan 16, 2024

Estimating Model Performance Under Covariate Shift Without Labels

arXiv:2401.08348v57 citations
Originality Incremental advance
AI Analysis

This addresses performance degradation in deployed models for binary classification, offering a practical solution for unlabeled data, though it is incremental as it builds on existing proxy methods.

The paper tackles the problem of estimating machine learning model performance under covariate shift without labels, introducing Probabilistic Adaptive Performance Estimation (PAPE) that outperforms benchmarks in over 900 dataset-model combinations from US census data.

After deployment, machine learning models often experience performance degradation due to shifts in data distribution. It is challenging to assess post-deployment performance accurately when labels are missing or delayed. Existing proxy methods, such as data drift detection, fail to measure the effects of these shifts adequately. To address this, we introduce a new method for evaluating binary classification models on unlabeled tabular data that accurately estimates model performance under covariate shift and call it Probabilistic Adaptive Performance Estimation (PAPE). It can be applied to any performance metric defined with elements of the confusion matrix. Crucially, PAPE operates independently of the original model, relying only on its predictions and probability estimates, and does not need any assumptions about the nature of covariate shift, learning directly from data instead. We tested PAPE using over 900 dataset-model combinations from US census data, assessing its performance against several benchmarks through various metrics. Our findings show that PAPE outperforms other methodologies, making it a superior choice for estimating the performance of binary classification models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes