LGMLJun 30, 2022

Transfer Learning with Deep Tabular Models

Amazon
arXiv:2206.15306v278 citationsh-index: 72Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of scarce task-specific data in tabular domains like medical diagnosis, offering incremental improvements for practitioners using neural models.

The paper tackles the problem of transfer learning for tabular data by demonstrating that upstream data gives neural networks an advantage over gradient boosted decision trees, proposing a benchmark and methods including a pseudo-feature technique for differing feature sets.

Recent work on deep learning for tabular data demonstrates the strong performance of deep tabular models, often bridging the gap between gradient boosted decision trees and neural networks. Accuracy aside, a major advantage of neural models is that they learn reusable features and are easily fine-tuned in new domains. This property is often exploited in computer vision and natural language applications, where transfer learning is indispensable when task-specific training data is scarce. In this work, we demonstrate that upstream data gives tabular neural networks a decisive advantage over widely used GBDT models. We propose a realistic medical diagnosis benchmark for tabular transfer learning, and we present a how-to guide for using upstream data to boost performance with a variety of tabular neural network architectures. Finally, we propose a pseudo-feature method for cases where the upstream and downstream feature sets differ, a tabular-specific problem widespread in real-world applications. Our code is available at https://github.com/LevinRoman/tabular-transfer-learning .

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes