LGDec 6, 2022

Benchmarking AutoML algorithms on a collection of synthetic classification problems

Pedro Henrique Ribeiro, Patryk Orzechowski, Joost Wagenaar, Jason H. Moore

arXiv:2212.02704v31.83 citationsh-index: 16Has Code

Originality Synthesis-oriented

AI Analysis

This work provides a benchmarking study for AutoML algorithms, which is incremental as it applies existing methods to new synthetic data to aid in algorithm selection for practitioners.

The paper compared four AutoML algorithms on synthetic classification datasets from the DIGEN benchmark, finding that AutoML can identify effective pipelines with most algorithms performing similarly but showing some differences based on dataset and metric.

Automated machine learning (AutoML) algorithms have grown in popularity due to their high performance and flexibility to adapt to different problems and data sets. With the increasing number of AutoML algorithms, deciding which would best suit a given problem becomes increasingly more work. Therefore, it is essential to use complex and challenging benchmarks which would be able to differentiate the AutoML algorithms from each other. This paper compares the performance of four different AutoML algorithms: Tree-based Pipeline Optimization Tool (TPOT), Auto-Sklearn, Auto-Sklearn 2, and H2O AutoML. We use the Diverse and Generative ML benchmark (DIGEN), a diverse set of synthetic datasets derived from generative functions designed to highlight the strengths and weaknesses of the performance of common machine learning algorithms. We confirm that AutoML can identify pipelines that perform well on all included datasets. Most AutoML algorithms performed similarly; however, there were some differences depending on the specific dataset and metric used.

View on arXiv PDF Code

Similar