LaTable: Towards Large Tabular Models
This addresses the problem of lagging generative models for tabular data, which is a ubiquitous modality, though it appears incremental as it builds on existing diffusion methods.
The authors tackled the challenge of creating a generative foundation model for heterogeneous tabular data, proposing LaTable, a tabular diffusion model that outperforms baselines on in-distribution generation and improves out-of-distribution generation with fewer samples after fine-tuning.
Tabular data is one of the most ubiquitous modalities, yet the literature on tabular generative foundation models is lagging far behind its text and vision counterparts. Creating such a model is hard, due to the heterogeneous feature spaces of different tabular datasets, tabular metadata (e.g. dataset description and feature headers), and tables lacking prior knowledge (e.g. feature order). In this work we propose LaTable: a novel tabular diffusion model that addresses these challenges and can be trained across different datasets. Through extensive experiments we find that LaTable outperforms baselines on in-distribution generation, and that finetuning LaTable can generate out-of-distribution datasets better with fewer samples. On the other hand, we explore the poor zero-shot performance of LaTable, and what it may teach us about building generative tabular foundation models with better zero- and few-shot generation capabilities.