LGJun 2

Speedrunning Tabular Foundation Model Pretraining

Salih Bora Ozturk, Alexander Pfefferle, Frank Hutter

arXiv:2606.0368180.5h-index: 9Has Code

Predicted impact top 10% in LG · last 90 daysOriginality Synthesis-oriented

AI Analysis

For researchers developing tabular foundation models, this provides a standardized protocol to compare and accumulate pretraining speedups, accelerating iteration cycles.

The authors introduce a community speedrun for nanoTabPFN to reduce pretraining costs, achieving an 81x speedup (0.92 vs 74.32 minutes) to a fixed ROC AUC target using 22x fewer synthetic datasets.

Pretraining cost is a major bottleneck for research on tabular foundation models, slowing the iteration cycle for new architectures, priors, and optimization ideas. Yet the community lacks a simple way to compare and accumulate pretraining speedups. We introduce a community speedrun for nanoTabPFN: contributors modify a single-file training script and compete to reach a fixed downstream ROC AUC target on subsampled TabArena using one NVIDIA L40S GPU. The current best record reaches the target in 0.92 minutes, an 81x speedup over the 74.32 minute baseline while using 22x fewer synthetic datasets. The speedrun format provides a simple protocol for the community to add, verify, and stack pretraining improvements, with the leaderboard open to contributions. Code and records are available at https://github.com/borawhocodess/modded-nanotabpfn.

View on arXiv PDF Code

Similar