LGJun 25, 2024

LaTable: Towards Large Tabular Models

arXiv:2406.17673v16 citations
Originality Incremental advance
AI Analysis

This addresses the problem of lagging generative models for tabular data, which is a ubiquitous modality, though it appears incremental as it builds on existing diffusion methods.

The authors tackled the challenge of creating a generative foundation model for heterogeneous tabular data, proposing LaTable, a tabular diffusion model that outperforms baselines on in-distribution generation and improves out-of-distribution generation with fewer samples after fine-tuning.

Tabular data is one of the most ubiquitous modalities, yet the literature on tabular generative foundation models is lagging far behind its text and vision counterparts. Creating such a model is hard, due to the heterogeneous feature spaces of different tabular datasets, tabular metadata (e.g. dataset description and feature headers), and tables lacking prior knowledge (e.g. feature order). In this work we propose LaTable: a novel tabular diffusion model that addresses these challenges and can be trained across different datasets. Through extensive experiments we find that LaTable outperforms baselines on in-distribution generation, and that finetuning LaTable can generate out-of-distribution datasets better with fewer samples. On the other hand, we explore the poor zero-shot performance of LaTable, and what it may teach us about building generative tabular foundation models with better zero- and few-shot generation capabilities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes