LGAINov 3, 2020

Tabular Transformers for Modeling Multivariate Time Series

arXiv:2011.01843v2127 citationsHas Code
AI Analysis

This work addresses the problem of handling tabular time series for data scientists, offering incremental improvements by adapting existing transformer methods to this domain.

The paper tackles modeling multivariate time series in tabular data by proposing two transformer-based architectures for representation learning and generation, demonstrating them on fraud detection and pollution prediction tasks with concrete results on synthetic and real datasets.

Tabular datasets are ubiquitous in data science applications. Given their importance, it seems natural to apply state-of-the-art deep learning algorithms in order to fully unlock their potential. Here we propose neural network models that represent tabular time series that can optionally leverage their hierarchical structure. This results in two architectures for tabular time series: one for learning representations that is analogous to BERT and can be pre-trained end-to-end and used in downstream tasks, and one that is akin to GPT and can be used for generation of realistic synthetic tabular sequences. We demonstrate our models on two datasets: a synthetic credit card transaction dataset, where the learned representations are used for fraud detection and synthetic data generation, and on a real pollution dataset, where the learned encodings are used to predict atmospheric pollutant concentrations. Code and data are available at https://github.com/IBM/TabFormer.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes