LGJun 7, 2024

TabPFGen -- Tabular Data Generation with TabPFN

arXiv:2406.05216v130 citations
Originality Highly original
AI Analysis

This addresses the problem of tabular data generation for machine learning practitioners, offering a novel approach that leverages pre-trained models.

The paper tackles the challenge of applying deep generative models to tabular data by transforming TabPFN, a transformer for discriminative tasks, into an energy-based generative model called TabPFGen, which achieves strong results on tasks like data augmentation, class-balancing, and imputation without additional training.

Advances in deep generative modelling have not translated well to tabular data. We argue that this is caused by a mismatch in structure between popular generative models and discriminative models of tabular data. We thus devise a technique to turn TabPFN -- a highly performant transformer initially designed for in-context discriminative tabular tasks -- into an energy-based generative model, which we dub TabPFGen. This novel framework leverages the pre-trained TabPFN as part of the energy function and does not require any additional training or hyperparameter tuning, thus inheriting TabPFN's in-context learning capability. We can sample from TabPFGen analogously to other energy-based models. We demonstrate strong results on standard generative modelling tasks, including data augmentation, class-balancing, and imputation, unlocking a new frontier of tabular data generation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes