ME ST AP MLMay 10, 2019

Prediction and outlier detection in classification problems

arXiv:1905.04396v320.886 citations

Originality Highly original

AI Analysis

This addresses the problem of reliable classification and outlier detection in non-stationary environments for machine learning practitioners, representing a novel method for a known bottleneck.

The paper tackles multi-class classification with distribution shift between training and test data by proposing BCOPS, a method that constructs prediction sets to optimize out-of-sample performance and detect outliers, with finite-sample coverage guarantees and proven asymptotic consistency.

We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction sets). BCOPS constructs a prediction set $C(x)$ as a subset of class labels, possibly empty. It tries to optimize the out-of-sample performance, aiming to include the correct class as often as possible, but also detecting outliers $x$, for which the method returns no prediction (corresponding to $C(x)$ equal to the empty set). The proposed method combines supervised-learning algorithms with the method of conformal prediction to minimize a misclassification loss averaged over the out-of-sample distribution. The constructed prediction sets have a finite-sample coverage guarantee without distributional assumptions. We also propose a method to estimate the outlier detection rate of a given method. We prove asymptotic consistency and optimality of our proposals under suitable assumptions and illustrate our methods on real data examples.

View on arXiv PDF

Similar