LG DBApr 23

Towards Universal Tabular Embeddings: A Benchmark Across Data Tasks

Liane Vogel, Kavitha Srinivas, Niharika D'Souza, Sola Shirai, Oktie Hassanzadeh, Horst Samulowitz

arXiv:2604.2169671.91 citations

Predicted impact top 23% in LG · last 90 daysOriginality Synthesis-oriented

AI Analysis

For practitioners and researchers working with tabular data, this benchmark enables systematic comparison of embedding models, though the findings are task-dependent and incremental.

The paper introduces TEmBed, a benchmark for evaluating tabular embeddings across four representation levels, and finds that the best model depends on the task and level, providing practical guidance for selection.

Tabular foundation models aim to learn universal representations of tabular data that transfer across tasks and domains, enabling applications such as table retrieval, semantic search and table-based prediction. Despite the growing number of such models, it remains unclear which approach works best in practice, as existing methods are often evaluated under task-specific settings that make direct comparison difficult. To address this, we introduce TEmBed, the Tabular Embedding Test Bed, a comprehensive benchmark for systematically evaluating tabular embeddings across four representation levels: cell, row, column, and table. Evaluating a diverse set of tabular representation learning models, we show that which model to use depends on the task and representation level. Our results offer practical guidance for selecting tabular embeddings in real-world applications and lay the groundwork for developing more general-purpose tabular representation models.

View on arXiv PDF

Similar