LGAIJun 3, 2024

TabMDA: Tabular Manifold Data Augmentation for Any Classifier using Transformers with In-context Subsetting

arXiv:2406.01805v28 citationsHas Code
AI Analysis

This addresses data scarcity issues in critical domains using tabular data, offering a training-free solution for any classifier, though it is incremental as it builds on existing in-context models.

The paper tackles the problem of data scarcity in tabular data by introducing TabMDA, a manifold data augmentation method using pre-trained in-context models, which significantly improves performance across various classifiers and datasets.

Tabular data is prevalent in many critical domains, yet it is often challenging to acquire in large quantities. This scarcity usually results in poor performance of machine learning models on such data. Data augmentation, a common strategy for performance improvement in vision and language tasks, typically underperforms for tabular data due to the lack of explicit symmetries in the input space. To overcome this challenge, we introduce TabMDA, a novel method for manifold data augmentation on tabular data. This method utilises a pre-trained in-context model, such as TabPFN, to map the data into an embedding space. TabMDA performs label-invariant transformations by encoding the data multiple times with varied contexts. This process explores the learned embedding space of the underlying in-context models, thereby enlarging the training dataset. TabMDA is a training-free method, making it applicable to any classifier. We evaluate TabMDA on five standard classifiers and observe significant performance improvements across various tabular datasets. Our results demonstrate that TabMDA provides an effective way to leverage information from pre-trained in-context models to enhance the performance of downstream classifiers. Code is available at https://github.com/AdrianBZG/TabMDA.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes