LGMLMar 20, 2017

On the Use of Default Parameter Settings in the Empirical Evaluation of Classification Algorithms

arXiv:1703.06777v122 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of fair evaluation in machine learning for researchers and practitioners, highlighting an incremental but critical issue in benchmarking practices.

The study shows that using default parameter settings in algorithm evaluation can cause performance differences as large as those between state-of-the-art and uncompetitive systems, and finds that rotation forest is significantly more accurate than random forest and support vector machines on average across 121 classification problems.

We demonstrate that, for a range of state-of-the-art machine learning algorithms, the differences in generalisation performance obtained using default parameter settings and using parameters tuned via cross-validation can be similar in magnitude to the differences in performance observed between state-of-the-art and uncompetitive learning systems. This means that fair and rigorous evaluation of new learning algorithms requires performance comparison against benchmark methods with best-practice model selection procedures, rather than using default parameter settings. We investigate the sensitivity of three key machine learning algorithms (support vector machine, random forest and rotation forest) to their default parameter settings, and provide guidance on determining sensible default parameter values for implementations of these algorithms. We also conduct an experimental comparison of these three algorithms on 121 classification problems and find that, perhaps surprisingly, rotation forest is significantly more accurate on average than both random forest and a support vector machine.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes