LGOct 23, 2023

Unlocking the Transferability of Tokens in Deep Models for Tabular Data

arXiv:2310.15149v111 citationsh-index: 40
Originality Incremental advance
AI Analysis

This addresses a bottleneck in applying pre-trained models to tabular data for practitioners dealing with heterogeneous datasets, though it is incremental as it builds on existing fine-tuning paradigms.

The paper tackles the challenge of fine-tuning pre-trained deep models on tabular data with mismatched feature sets by proposing TabToken, a method that improves feature token embeddings through contrastive regularization, enabling knowledge transfer and enhancing model performance in classification and regression tasks.

Fine-tuning a pre-trained deep neural network has become a successful paradigm in various machine learning tasks. However, such a paradigm becomes particularly challenging with tabular data when there are discrepancies between the feature sets of pre-trained models and the target tasks. In this paper, we propose TabToken, a method aims at enhancing the quality of feature tokens (i.e., embeddings of tabular features). TabToken allows for the utilization of pre-trained models when the upstream and downstream tasks share overlapping features, facilitating model fine-tuning even with limited training examples. Specifically, we introduce a contrastive objective that regularizes the tokens, capturing the semantics within and across features. During the pre-training stage, the tokens are learned jointly with top-layer deep models such as transformer. In the downstream task, tokens of the shared features are kept fixed while TabToken efficiently fine-tunes the remaining parts of the model. TabToken not only enables knowledge transfer from a pre-trained model to tasks with heterogeneous features, but also enhances the discriminative ability of deep tabular models in standard classification and regression tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes