LGMar 19, 2025

GReaTER: Generate Realistic Tabular data after data Enhancement and Reduction

arXiv:2503.15564v14 citationsh-index: 92025 IEEE 41st International Conference on Data Engineering Workshops (ICDEW)
Originality Incremental advance
AI Analysis

This work addresses tabular data synthesis for applications requiring multi-modal and multi-table generation, but it appears incremental as it builds directly on the GReaT framework.

The paper tackled the problem of generating realistic tabular data by addressing limitations in the GReaT framework, such as insufficient semantic meaning in entries and ineffective relationships in multi-table datasets, and proposed GReaTER with data enhancement and cross-table connection methods, resulting in improved performance over GReaT.

Tabular data synthesis involves not only multi-table synthesis but also generating multi-modal data (e.g., strings and categories), which enables diverse knowledge synthesis. However, separating numerical and categorical data has limited the effectiveness of tabular data generation. The GReaT (Generate Realistic Tabular Data) framework uses Large Language Models (LLMs) to encode entire rows, eliminating the need to partition data types. Despite this, the framework's performance is constrained by two issues: (1) tabular data entries lack sufficient semantic meaning, limiting LLM's ability to leverage pre-trained knowledge for in-context learning, and (2) complex multi-table datasets struggle to establish effective relationships for collaboration. To address these, we propose GReaTER (Generate Realistic Tabular Data after data Enhancement and Reduction), which includes: (1) a data semantic enhancement system that improves LLM's understanding of tabular data through mapping, enabling better in-context learning, and (2) a cross-table connecting method to establish efficient relationships across complex tables. Experimental results show that GReaTER outperforms the GReaT framework.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes