In-Context Data Distillation with TabPFN
This addresses a key limitation for applying TabPFN in real-world tabular data scenarios, though it is an incremental improvement on an existing method.
The paper tackles the data size constraint of TabPFN, a transformer model for tabular data, by introducing in-context data distillation (ICD), which optimizes its context to handle larger datasets with fixed memory, achieving strong performance on 48 large tabular datasets against tree-based and deep learning models.
Foundation models have revolutionized tasks in computer vision and natural language processing. However, in the realm of tabular data, tree-based models like XGBoost continue to dominate. TabPFN, a transformer model tailored for tabular data, mirrors recent foundation models in its exceptional in-context learning capability, being competitive with XGBoost's performance without the need for task-specific training or hyperparameter tuning. Despite its promise, TabPFN's applicability is hindered by its data size constraint, limiting its use in real-world scenarios. To address this, we present in-context data distillation (ICD), a novel methodology that effectively eliminates these constraints by optimizing TabPFN's context. ICD efficiently enables TabPFN to handle significantly larger datasets with a fixed memory budget, improving TabPFN's quadratic memory complexity but at the cost of a linear number of tuning steps. Notably, TabPFN, enhanced with ICD, demonstrates very strong performance against established tree-based models and modern deep learning methods on 48 large tabular datasets from OpenML.