MLDBLGJun 28, 2018

Automatic Exploration of Machine Learning Experiments on OpenML

arXiv:1806.10961v326 citations
Originality Synthesis-oriented
AI Analysis

This provides a resource for researchers studying hyperparameter effects and tuning, though it is incremental as it focuses on data collection rather than new methods.

The paper tackles the scarcity of experimental metadata for understanding hyperparameter influence by presenting a large, open dataset of 2.5 million experiments across 38 OpenML datasets and six algorithms, generated via automated random sampling.

Understanding the influence of hyperparameters on the performance of a machine learning algorithm is an important scientific topic in itself and can help to improve automatic hyperparameter tuning procedures. Unfortunately, experimental meta data for this purpose is still rare. This paper presents a large, free and open dataset addressing this problem, containing results on 38 OpenML data sets, six different machine learning algorithms and many different hyperparameter configurations. Results where generated by an automated random sampling strategy, termed the OpenML Random Bot. Each algorithm was cross-validated up to 20.000 times per dataset with different hyperparameters settings, resulting in a meta dataset of around 2.5 million experiments overall.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes