MLLGMESep 5, 2022

Statistical Comparisons of Classifiers by Generalized Stochastic Dominance

arXiv:2209.01857v222 citationsh-index: 24
Originality Incremental advance
AI Analysis

This provides a statistical method for researchers and practitioners to compare machine learning classifiers more robustly, though it is incremental by building on decision theory.

The paper tackles the problem of comparing classifiers across multiple datasets and criteria by introducing a framework based on generalized stochastic dominance, which avoids reliance on aggregates and is operationalized through linear programs and statistical tests. The approach is demonstrated to be powerful in simulation studies and standard benchmarks.

Although being a crucial question for the development of machine learning algorithms, there is still no consensus on how to compare classifiers over multiple data sets with respect to several criteria. Every comparison framework is confronted with (at least) three fundamental challenges: the multiplicity of quality criteria, the multiplicity of data sets and the randomness of the selection of data sets. In this paper, we add a fresh view to the vivid debate by adopting recent developments in decision theory. Based on so-called preference systems, our framework ranks classifiers by a generalized concept of stochastic dominance, which powerfully circumvents the cumbersome, and often even self-contradictory, reliance on aggregates. Moreover, we show that generalized stochastic dominance can be operationalized by solving easy-to-handle linear programs and moreover statistically tested employing an adapted two-sample observation-randomization test. This yields indeed a powerful framework for the statistical comparison of classifiers over multiple data sets with respect to multiple quality criteria simultaneously. We illustrate and investigate our framework in a simulation study and with a set of standard benchmark data sets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes