Accuracy Measures for the Comparison of Classifiers
This work addresses a widespread but complex methodological choice in machine learning for researchers and practitioners, though it is incremental as it reviews and critiques existing measures.
The paper tackles the problem of selecting the best classification algorithm by evaluating performance measures, concluding that classic overall success rate or marginal rates are preferable due to issues with many existing measures.
The selection of the best classification algorithm for a given dataset is a very widespread problem. It is also a complex one, in the sense it requires to make several important methodological choices. Among them, in this work we focus on the measure used to assess the classification performance and rank the algorithms. We present the most popular measures and discuss their properties. Despite the numerous measures proposed over the years, many of them turn out to be equivalent in this specific case, to have interpretation problems, or to be unsuitable for our purpose. Consequently, classic overall success rate or marginal rates should be preferred for this specific task.