LGMay 27

Hierarchical Synthetic Tabular Data Generation: A Hybrid Top-Down and Bottom-Up Framework

arXiv:2605.2819847.0
Predicted impact top 54% in LG · last 90 daysOriginality Synthesis-oriented
AI Analysis

It addresses the problem of generating realistic synthetic tabular data with logical consistency and rare-event coverage for data-scarce domains, but the evaluation is limited to a specific financial benchmark and improvements are incremental.

The paper proposes a hierarchical hybrid top-down and bottom-up framework for synthetic tabular data generation that decouples semantic structures from stochastic texture, improving train-synthetic-test-real performance over neural baselines on weak multimodal financial benchmarks while preserving semantic consistency.

Existing approaches for synthetic tabular data generation are based on either purely generative models or LLMs, both of which struggle with data heterogeneity, logical consistency, rare-event coverage, and robustness in low-data regimes. In this paper, we propose a hierarchical hybrid top-down and bottom-up (H-TDBU) framework that decouples semantic structures from stochastic texture. In the top-down path, structure-driven logical constraints and cross-modal alignment rules are constructed, while in the bottom-up path, lightweight tabular generators are used to learn local statistical patterns from real data. The two paths are consolidated in a unified synthesis engine with an iterative feedback loop. We evaluate the framework on weak multimodal financial benchmarks combining tabular and sentiment-text data. Experimental results show that our H-TDBU approach improves train-synthetic-test-real performance over neural baseline methods while preserving semantic consistency. Our results suggest that hierarchical rule-guided synthesis provides an effective mechanism for combining controllability, semantic coherence, and statistical fidelity in synthetic data generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes