AIGT: AI Generative Table Based on Prompt
This addresses privacy and data-sharing issues for enterprises by improving synthetic data generation, though it is incremental as it builds on existing LLM-based methods.
The paper tackled the problem of generating high-quality synthetic tabular data by introducing AIGT, a method that uses meta data as prompts to enhance generation, achieving state-of-the-art performance on 14 out of 20 public datasets and two real industry datasets.
Tabular data, which accounts for over 80% of enterprise data assets, is vital in various fields. With growing concerns about privacy protection and data-sharing restrictions, generating high-quality synthetic tabular data has become essential. Recent advancements show that large language models (LLMs) can effectively gener-ate realistic tabular data by leveraging semantic information and overcoming the challenges of high-dimensional data that arise from one-hot encoding. However, current methods do not fully utilize the rich information available in tables. To address this, we introduce AI Generative Table (AIGT) based on prompt enhancement, a novel approach that utilizes meta data information, such as table descriptions and schemas, as prompts to generate ultra-high quality synthetic data. To overcome the token limit constraints of LLMs, we propose long-token partitioning algorithms that enable AIGT to model tables of any scale. AIGT achieves state-of-the-art performance on 14 out of 20 public datasets and two real industry datasets within the Alipay risk control system.