LG AIJun 11, 2024

Tokenize features, enhancing tables: the FT-TABPFN model for tabular classification

Quangao Liu, Wei Yang, Chen Liang, Longlong Pang, Zhuozhang Zou

arXiv:2406.06891v12.6Has Code

Originality Incremental advance

AI Analysis

This work addresses a specific limitation in tabular classification models for practitioners dealing with categorical data, representing an incremental enhancement.

The authors tackled the problem of TabPFN's weaker performance on categorical features in tabular classification by proposing FT-TabPFN, which includes a Feature Tokenization layer and fine-tuning, resulting in significant improvements in accuracy and applicability.

Traditional methods for tabular classification usually rely on supervised learning from scratch, which requires extensive training data to determine model parameters. However, a novel approach called Prior-Data Fitted Networks (TabPFN) has changed this paradigm. TabPFN uses a 12-layer transformer trained on large synthetic datasets to learn universal tabular representations. This method enables fast and accurate predictions on new tasks with a single forward pass and no need for additional training. Although TabPFN has been successful on small datasets, it generally shows weaker performance when dealing with categorical features. To overcome this limitation, we propose FT-TabPFN, which is an enhanced version of TabPFN that includes a novel Feature Tokenization layer to better handle classification features. By fine-tuning it for downstream tasks, FT-TabPFN not only expands the functionality of the original model but also significantly improves its applicability and accuracy in tabular classification. Our full source code is available for community use and development.

View on arXiv PDF

Similar