Fine-tuned In-Context Learning Transformers are Excellent Tabular Data Classifiers
This work addresses the challenge of enhancing classification accuracy for tabular data, which is crucial for domains like finance and healthcare, though it is incremental as it builds upon existing TabPFN methods.
The authors tackled the problem of improving tabular data classification by fine-tuning In-Context Learning transformers, resulting in a significant performance boost and the creation of TabForestPFN, which achieves excellent fine-tuning and good zero-shot performance, outperforming TabPFN on some real-world datasets.
The recently introduced TabPFN pretrains an In-Context Learning (ICL) transformer on synthetic data to perform tabular data classification. In this work, we extend TabPFN to the fine-tuning setting, resulting in a significant performance boost. We also discover that fine-tuning enables ICL-transformers to create complex decision boundaries, a property regular neural networks do not have. Based on this observation, we propose to pretrain ICL-transformers on a new forest dataset generator which creates datasets that are unrealistic, but have complex decision boundaries. TabForest, the ICL-transformer pretrained on this dataset generator, shows better fine-tuning performance when pretrained on more complex datasets. Additionally, TabForest outperforms TabPFN on some real-world datasets when fine-tuning, despite having lower zero-shot performance due to the unrealistic nature of the pretraining datasets. By combining both dataset generators, we create TabForestPFN, an ICL-transformer that achieves excellent fine-tuning performance and good zero-shot performance.