On The Role of Prompt Construction In Enhancing Efficacy and Efficiency of LLM-Based Tabular Data Generation
This work addresses a domain-specific problem for practitioners using LLMs to generate realistic tabular data, but it is incremental as it builds on existing methods like GReaT.
The study tackled the challenge of insufficient semantic context in LLM-based tabular data generation by hypothesizing that domain-specific prompt enrichment improves quality and efficiency, and found that context-enriched prompts significantly enhanced both aspects in empirical tests with the GReaT framework.
LLM-based data generation for real-world tabular data can be challenged by the lack of sufficient semantic context in feature names used to describe columns. We hypothesize that enriching prompts with domain-specific insights can improve both the quality and efficiency of data generation. To test this hypothesis, we explore three prompt construction protocols: Expert-guided, LLM-guided, and Novel-Mapping. Through empirical studies with the recently proposed GReaT framework, we find that context-enriched prompts lead to significantly improved data generation quality and training efficiency.