Revisiting Metafeatures to Explain Model Differences on Tabular Data
This work addresses the challenge of model selection for tabular data, but the negative results and limited scope (51 datasets) make it an incremental contribution.
The authors investigated whether dataset meta-features can explain performance gaps between model families on tabular data, finding that after strict statistical testing, no meta-features robustly explain neural network vs. tree gaps, and only limited associations exist for foundation vs. non-foundation models, with meta-feature predictors failing to improve over a simple baseline.
With the rise of tabular foundation models alongside traditional models still performing well on many tasks, choosing the right model for a tabular dataset remains difficult. We investigate whether dataset meta-features can explain performance gaps between model families on tabular prediction tasks. Using the TabArena benchmark results, we analyze dataset-level performance gaps and relate them to model-agnostic dataset descriptors. After strict statistical tests with false discovery control, we find that (1) for neural network vs. tree gaps, no meta-feature survives false discovery control, (2) for non-foundation vs. foundation model gaps, one association is robust but does not generalize when tested in leave-one-dataset-out prediction, and (3) for TabICLv2 vs. TabPFN-2.6, one robust association also improves held-out prediction. Furthermore, we conduct a leave-one-dataset-out analysis and find that meta-feature predictors fail to improve meaningfully over a simple baseline. Overall, our results show the heterogeneity of tabular datasets and that global meta-feature approaches are not robust enough to offer explanations on the 51 TabArena datasets.