LG AIAug 27, 2024

A Comprehensive Benchmark of Machine and Deep Learning Across Diverse Tabular Datasets

Assaf Shmuel, Oren Glickman, Teddy Lazebnik

arXiv:2408.14817v115.722 citationsh-index: 8

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge for ML practitioners in selecting appropriate models for tabular data, but it is incremental as it builds on existing benchmarks with more extensive comparisons.

The study tackled the problem of understanding when deep learning models outperform traditional methods on tabular datasets by conducting a comprehensive benchmark across 111 datasets with 20 models, and they developed a model that predicts these scenarios with 86.1% accuracy (AUC 0.78).

The analysis of tabular datasets is highly prevalent both in scientific research and real-world applications of Machine Learning (ML). Unlike many other ML tasks, Deep Learning (DL) models often do not outperform traditional methods in this area. Previous comparative benchmarks have shown that DL performance is frequently equivalent or even inferior to models such as Gradient Boosting Machines (GBMs). In this study, we introduce a comprehensive benchmark aimed at better characterizing the types of datasets where DL models excel. Although several important benchmarks for tabular datasets already exist, our contribution lies in the variety and depth of our comparison: we evaluate 111 datasets with 20 different models, including both regression and classification tasks. These datasets vary in scale and include both those with and without categorical variables. Importantly, our benchmark contains a sufficient number of datasets where DL models perform best, allowing for a thorough analysis of the conditions under which DL models excel. Building on the results of this benchmark, we train a model that predicts scenarios where DL models outperform alternative methods with 86.1% accuracy (AUC 0.78). We present insights derived from this characterization and compare these findings to previous benchmarks.

View on arXiv PDF

Similar