Predictive Multiplicity in Classification
This addresses a key challenge in machine learning for practitioners and stakeholders by revealing that model selection may lead to inconsistent outcomes, even with similar performance, which is incremental in formalizing and quantifying this effect.
The paper tackles the problem of predictive multiplicity, where competing models perform similarly but assign conflicting predictions, by introducing formal measures and integer programming tools to evaluate its severity in linear classification. The results demonstrate that real-world datasets, such as recidivism prediction, can admit models with wildly conflicting predictions, highlighting the need to measure and report this issue in model development.
Prediction problems often admit competing models that perform almost equally well. This effect challenges key assumptions in machine learning when competing models assign conflicting predictions. In this paper, we define predictive multiplicity as the ability of a prediction problem to admit competing models with conflicting predictions. We introduce formal measures to evaluate the severity of predictive multiplicity and develop integer programming tools to compute them exactly for linear classification problems. We apply our tools to measure predictive multiplicity in recidivism prediction problems. Our results show that real-world datasets may admit competing models that assign wildly conflicting predictions, and motivate the need to measure and report predictive multiplicity in model development.