DBAILGFeb 3

PluRel: Synthetic Data unlocks Scaling Laws for Relational Foundation Models

arXiv:2602.04029v12 citations
AI Analysis

This addresses the problem of limited public relational data for RFMs, offering a synthetic data scaling paradigm that is incremental but impactful for data-driven decision-making.

The paper tackles the challenge of training Relational Foundation Models (RFMs) by introducing PluRel, a framework that synthesizes multi-tabular relational databases from scratch, enabling the observation of power-law scaling in pretraining loss and improved generalization to real databases.

Relational Foundation Models (RFMs) facilitate data-driven decision-making by learning from complex multi-table databases. However, the diverse relational databases needed to train such models are rarely public due to privacy constraints. While there are methods to generate synthetic tabular data of arbitrary size, incorporating schema structure and primary--foreign key connectivity for multi-table generation remains challenging. Here we introduce PluRel, a framework to synthesize multi-tabular relational databases from scratch. In a step-by-step fashion, PluRel models (1) schemas with directed graphs, (2) inter-table primary-foreign key connectivity with bipartite graphs, and, (3) feature distributions in tables via conditional causal mechanisms. The design space across these stages supports the synthesis of a wide range of diverse databases, while being computationally lightweight. Using PluRel, we observe for the first time that (1) RFM pretraining loss exhibits power-law scaling with the number of synthetic databases and total pretraining tokens, (2) scaling the number of synthetic databases improves generalization to real databases, and (3) synthetic pretraining yields strong base models for continued pretraining on real databases. Overall, our framework and results position synthetic data scaling as a promising paradigm for RFMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes