LGMay 10, 2023

XTab: Cross-table Pretraining for Tabular Transformers

arXiv:2305.06090v1112 citations
Originality Highly original
AI Analysis

This addresses the challenge of leveraging information across multiple tables for tabular data, improving generalizability and performance in tasks like regression and classification.

The paper tackles the problem of limited generalization in tabular self-supervised learning by introducing XTab, a cross-table pretraining framework for tabular transformers, which boosts performance on 84 tasks and achieves superior results compared to state-of-the-art models.

The success of self-supervised learning in computer vision and natural language processing has motivated pretraining methods on tabular data. However, most existing tabular self-supervised learning models fail to leverage information across multiple data tables and cannot generalize to new tables. In this work, we introduce XTab, a framework for cross-table pretraining of tabular transformers on datasets from various domains. We address the challenge of inconsistent column types and quantities among tables by utilizing independent featurizers and using federated learning to pretrain the shared component. Tested on 84 tabular prediction tasks from the OpenML-AutoML Benchmark (AMLB), we show that (1) XTab consistently boosts the generalizability, learning speed, and performance of multiple tabular transformers, (2) by pretraining FT-Transformer via XTab, we achieve superior performance than other state-of-the-art tabular deep learning models on various tasks such as regression, binary, and multiclass classification.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes