AIDec 24, 2024

AIGT: AI Generative Table Based on Prompt

arXiv:2412.18111v124 citationsh-index: 8COLING
Originality Incremental advance
AI Analysis

This addresses privacy and data-sharing issues for enterprises by improving synthetic data generation, though it is incremental as it builds on existing LLM-based methods.

The paper tackled the problem of generating high-quality synthetic tabular data by introducing AIGT, a method that uses meta data as prompts to enhance generation, achieving state-of-the-art performance on 14 out of 20 public datasets and two real industry datasets.

Tabular data, which accounts for over 80% of enterprise data assets, is vital in various fields. With growing concerns about privacy protection and data-sharing restrictions, generating high-quality synthetic tabular data has become essential. Recent advancements show that large language models (LLMs) can effectively gener-ate realistic tabular data by leveraging semantic information and overcoming the challenges of high-dimensional data that arise from one-hot encoding. However, current methods do not fully utilize the rich information available in tables. To address this, we introduce AI Generative Table (AIGT) based on prompt enhancement, a novel approach that utilizes meta data information, such as table descriptions and schemas, as prompts to generate ultra-high quality synthetic data. To overcome the token limit constraints of LLMs, we propose long-token partitioning algorithms that enable AIGT to model tables of any scale. AIGT achieves state-of-the-art performance on 14 out of 20 public datasets and two real industry datasets within the Alipay risk control system.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes