LGAINov 6, 2023

TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications

arXiv:2311.02971v334 citationsh-index: 6Has Code
Originality Incremental advance
AI Analysis

This provides a valuable resource for tabular machine learning researchers and practitioners by enabling efficient analysis and transfer-learning, though it is incremental as it builds on existing AutoML and transfer-learning concepts.

The authors introduced TabRepo, a large-scale dataset containing predictions and metrics from 1310 models evaluated on 200 tabular datasets, and demonstrated its utility by showing that transfer-learning techniques applied to this dataset can outperform current state-of-the-art tabular systems in accuracy, runtime, and latency.

We introduce TabRepo, a new dataset of tabular model evaluations and predictions. TabRepo contains the predictions and metrics of 1310 models evaluated on 200 classification and regression datasets. We illustrate the benefit of our dataset in multiple ways. First, we show that it allows to perform analysis such as comparing Hyperparameter Optimization against current AutoML systems while also considering ensembling at marginal cost by using precomputed model predictions. Second, we show that our dataset can be readily leveraged to perform transfer-learning. In particular, we show that applying standard transfer-learning techniques allows to outperform current state-of-the-art tabular systems in accuracy, runtime and latency.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes