LG AIDec 11, 2020

TabTransformer: Tabular Data Modeling Using Contextual Embeddings

Xin Huang, Ashish Khetan, Milan Cvitkovic, Zohar Karnin

arXiv:2012.06678v138.8884 citationsHas Code

Originality Highly original

AI Analysis

This work provides a new deep learning approach for tabular data, offering improved accuracy and robustness for practitioners working with structured datasets.

This paper introduces TabTransformer, a new deep learning architecture for tabular data modeling that uses self-attention based Transformers to create contextual embeddings for categorical features. It outperforms state-of-the-art deep learning methods by at least 1.0% on mean AUC and achieves an average 2.1% AUC lift in semi-supervised settings.

We propose TabTransformer, a novel deep tabular data modeling architecture for supervised and semi-supervised learning. The TabTransformer is built upon self-attention based Transformers. The Transformer layers transform the embeddings of categorical features into robust contextual embeddings to achieve higher prediction accuracy. Through extensive experiments on fifteen publicly available datasets, we show that the TabTransformer outperforms the state-of-the-art deep learning methods for tabular data by at least 1.0% on mean AUC, and matches the performance of tree-based ensemble models. Furthermore, we demonstrate that the contextual embeddings learned from TabTransformer are highly robust against both missing and noisy data features, and provide better interpretability. Lastly, for the semi-supervised setting we develop an unsupervised pre-training procedure to learn data-driven contextual embeddings, resulting in an average 2.1% AUC lift over the state-of-the-art methods.

View on arXiv PDF Code

Similar