MESTAPMLMay 10, 2019

Prediction and outlier detection in classification problems

arXiv:1905.04396v386 citations
Originality Highly original
AI Analysis

This addresses the problem of reliable classification and outlier detection in non-stationary environments for machine learning practitioners, representing a novel method for a known bottleneck.

The paper tackles multi-class classification with distribution shift between training and test data by proposing BCOPS, a method that constructs prediction sets to optimize out-of-sample performance and detect outliers, with finite-sample coverage guarantees and proven asymptotic consistency.

We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction sets). BCOPS constructs a prediction set $C(x)$ as a subset of class labels, possibly empty. It tries to optimize the out-of-sample performance, aiming to include the correct class as often as possible, but also detecting outliers $x$, for which the method returns no prediction (corresponding to $C(x)$ equal to the empty set). The proposed method combines supervised-learning algorithms with the method of conformal prediction to minimize a misclassification loss averaged over the out-of-sample distribution. The constructed prediction sets have a finite-sample coverage guarantee without distributional assumptions. We also propose a method to estimate the outlier detection rate of a given method. We prove asymptotic consistency and optimality of our proposals under suitable assumptions and illustrate our methods on real data examples.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes